|
|
# Natural Language Processing
|
|
|
|
|
|
### Preprocessing
|
|
|
|
|
|
###### Tokenizer
|
|
|
- SpaCy / nltk /...
|
|
|
- Byte-Pair-Encoding (sentencepiece / HuggingFace...)
|
|
|
###### Vocabulary
|
|
|
- Frequencies/counts are helpful
|
|
|
|
|
|
### Text representations
|
|
|
###### Word embeddings
|
|
|
###### Bag of words [1,0,0,0,1]
|
|
|
|
|
|
### Common NLP Architectures
|
|
|
###### Encoder-Decoder (Seq2Seq, Tree2Seq, ...)
|
|
|
###### Encoder
|
|
|
###### Siamese Network
|
|
|
|
|
|
### Transformer
|
|
|
###### Attention
|
|
|
###### Architecture |
|
|
\ No newline at end of file |