What is GPT-2?

OpenAI came out with a new language model that automatically synthesizes text, called GPT-2. Here is the model repo. In simple words the model predicts the next word. Nothing fancy. But it can do that reasonably well, that is almost human-equivalent writing ability.

If you pay closer attention you will understand the text generated by GPT-2 is technicaly OK, but often logicaly defect.

Why it is called GPT? GPT stands for General Purpose Technology. At first you may ask why it is called like that since it is just the next word predictor?

It came out that if you can predict the next word you can do many other things just based on that.

Almost any task in NLP (Neuro-Linguistic Programming) can be solved thanks to next word prediction models. You can say they are the foundation for almost any NLP task like:

  • Generate text
  • Reading comprehension
  • Annotate a sentence
  • Semantic role labeling and parsing
  • Named entity recognition
  • Constituency parsing
  • Dependency parsing
  • Information extraction
  • Annotate a passage
  • Coreference resolution
  • Answer a question
  • Text to SQL

Some facts:

GPT-2 model hs 1.5 Billion parameters, trained on 40GB of text data. However, the full model is not available, instead the shorter model is something you can try and test.