Acronyms, metrics and tasks in NLP
Wanted to have a summary of he NLP acronyms and common tasks.
Acronyms in NLP
- LM Language Modelling (predicting the next word given the previous context)
- NLP Natural Language Processing
- NLG Natural Language Generation
- NLU Natural Language Understanding
- NLI Natural Language Inference
- NED Named Entity Disambiguation
- NER Named Entity Recognition
- NMT Neural Machine Translation
- BPE Byte Pair Encoding
- NSP Next Sentence Prediction (like Bert)
- MLM Masked Language Model (Like Bert)
- CLM Causal Language Model (like GPT2)
- RTE Recognizing Textual Entailment (pair of sentences, and the task is to predict whether the first entails the second; part of GLUE)
- MRPC Microsoft Research Paraphrase Corpus (pair of sentences, labeled as almost semantically equivalent, or no, part of GLUE)
- PBSMT Phrase Based Statistical Machine Translation (consisting of 9 tasks)
- WER Word Error Rate
- BERT Bidirectional Encoder Representations from Transformers
- GloVe Global Vectors
- NEL Named Entity Linking
- NED Named Entity Disambiguation
- NERD Named Entity Recognition and Disambiguation
- NEN Named Entity Normalization
- CoLA Corpus of Linguistic Acceptability (possible or no, T5 uses this; part of GLUE)
- SWAG Situations With Adversarial Generations
- WNLI Winograd NLI
- STS Semantic Textual Similarity
- SST Stanford Sentiment Treebank (positive or negative movie review; part of GLUE)
Metrics
- BLEU BiLingual Evaluation Understudy
- ROUGE Recall Oriented Understudy for Gisting Evaluation
- GLUE General Language Understanding Evaluation
- SQuAD Stanford Question Answering Dataset v1.1 and v2.0.
- GLEU Generalized Language Understanding Evaluation
- SEQVAL Sequence labeling Evaluation.
- XNLI Cross-lingual Natural Language Inference
Tasks in NLP:
- Automatic summarization
- Collocation extraction
- Information extraction
- Entity linking
- Natural language parsing
- Part-of-speech tagging
- Phrase chunking
- Question answering
- Relationship extraction
- Semantic parsing
- Stemming
- Shallow parsing
- Text segmentation
- Textual entailment
- Text simplification
- Truecasing
- Terminology extraction
- Lemmatisation
- Downstream tasks (on pretrained models) aka finetuning
- Word-sense disambiguation
- POS part of speech tagging (assigning tags such as n, v, adj to tokens)
- Constituent labeling
- Dependency labeling
- Named entity labeling
- SRL Semantic role labeling
- Coreference (Obama -> the former president)
- Semantic proto-role (SPR)
- Relation classification
Links
- http://nlpprogress.com/english/common_sense.html
- https://paperswithcode.com/area/natural-language-processing
- https://arxiv.org/abs/1905.06316