- How to start with Transformers?
- Major objects in use
- Using the pre-trained bins for GPT2
- Picking the right model
- Generating text using GPT-2
- Why did we use the GPT2LMHeadModel?
How to start with Transformers?
First install the Transformers from Hugging Face.
!pip install -q git+https://github.com/huggingface/transformers.git
Then I would assume you will be using either TensorFlow or PyTorch. Make sure you have one of these installed and check the version you have.
import tensorflow as tf print(tf.__version__)
import torch print(torch.__version__)
Major objects in use
There are three kind of major classes/objects in Transformers:
- configuration class
- model class and
- tokenizer class
Using the pre-trained bins for GPT2
All three: the model, the configuration and the tokenizer you can load using the
from_pretrained method. Also they can be saved using the
from transformers import GPT2Model, GPT2Tokenizer gpt2_model = GPT2LMHeadModel.from_pretrained("gpt2-large")# auto loads the config gpt2_tokenizer = GPT2Tokenizer.from_pretrained("gpt2-large")
If on TensorFlow use
You may soon note the tokenizer class is the same for TensorFlow and PyTorch but the TensorFlow model has the TF prefix (TFBertModel). This is because at first the library was called PyTorch Transformers and it was originally created in PyTorch. Later on they added TF prefix for all model class names to be used in TensorFlow.
Picking the right model
There are several GPT2 models to peak:
# "gpt2": "https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-pytorch_model.bin" # "gpt2-medium": "https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-medium-pytorch_model.bin" # "gpt2-large": "https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-large-pytorch_model.bin" # "gpt2-xl": "https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-xl-pytorch_model.bin" # "distilgpt2": "https://s3.amazonaws.com/models.huggingface.co/bert/distilgpt2-pytorch_model.bin"
All you need to do if you would like to check the distilled GPT-2 is to write:
gpt2_model = GPT2LMHeadModel.from_pretrained("distilgpt2")
Generating text using GPT-2
Let’s use the GTP-2 large model.
from transformers import GPT2LMHeadModel, GPT2Tokenizer gpt2_model = GPT2LMHeadModel.from_pretrained("gpt2-large") gpt2_tokenizer = GPT2Tokenizer.from_pretrained("gpt2-large")
You can get the number of parameters for the model like this:
sum(p.numel() for p in gpt2_model.parameters() if p.requires_grad)
This is a very big model with almost a billion parameters.
The gpt2-xl model should have about 1.5B parameters.
Here is how you can make it complete your thought.
import random import numpy as np seed = random.randint(0, 13) np.random.seed(seed) torch.random.manual_seed(seed) torch.cuda.manual_seed(seed) device = torch.device("cuda" if torch.cuda.is_available() else "cpu") text = """All of this is right here, ready to be used in your favorite pizza recipes.""" input_ids = torch.tensor(gpt2_tokenizer.encode(text, add_special_tokens=True)).unsqueeze(0) # bs=1 gpt2_model.to(device) gpt2_model.eval() outputs = gpt2_model.generate( input_ids.to(device), max_length=500, do_sample=True, top_k=20, temperature=0.7 ) print(gpt2_tokenizer.decode(outputs, skip_special_tokens=True)) outputs.shape,outputs.shape # (torch.Size([1, 500]), torch.Size())
All of this is right here, ready to be used in your favorite pizza recipes. And if you want to try the recipe as written, you can use the "pizza dough" from the recipe. And since you probably want to make this pizza in a large casserole dish, here are the measurements and ingredients: 2 cups (1 lb) grated cheese 1 cup (1 stick) butter 1 cup (1 stick) flour 1 1/2 cups (5 1/2 oz) grated mozzarella cheese 1/4 cup (1/2 stick) sugar 1/4 cup (1/2 stick) cornstarch 1 1/2 cups (5 oz) water 1. Preheat the oven to 350 degrees F. 2. In a large bowl, mix the cheese, butter, flour and cornstarch. 3. In a small bowl, whisk together the water and 1/2 cup of the cheese mixture. 4. Pour the mixture into the casserole dish and bake for 30 minutes or until the cheese is melted. 5. Remove from the oven and let cool. 6. While the cheese is cooling, melt the remaining 2 cups of the cheese mixture in a large heavy bottomed pot. 7. Add the milk and mix until the mixture is smooth. 8. Add the flour and mix to combine. 9. Add the remaining 1 cup of cheese mixture, cornstarch and water and mix until the mixture is smooth. 10. Pour the cheese mixture into the prepared casserole dish and bake for 40 minutes or until the cheese is well melted and the topping is bubbly. 11. Serve with your favorite toppings and enjoy! Note: If you are not using a large casserole dish, you may need to use 1 cup of the cheese mixture for each cup of filling. If you are making this recipe, please do let us know what you think by posting a comment below. If you enjoyed this recipe, please share with your friends by clicking on any of the social media icons below, or by clicking the "Share This Recipe" button below. Thank you!
Why did we use the GPT2LMHeadModel?
You may note we used
GPT2LMHeadModel class with the extra linear layer at the end (head)
(lm_head): Linear(in_features=768, out_features=50257, bias=False)
This extra layer will convert the hidden states (exactly 768 hidden states) into 50257 out_features. Why is the 50257 important?
This is the vocab size of the GPT-2 transformer. Based on that at the end we will get the probabilities for each word from the vocab.
If we would not have the last layer we would not have the means to interpret the hidden states, or in other words our output would be the hidden states.
import torch from transformers import GPT2Tokenizer, GPT2Model tokenizer = GPT2Tokenizer.from_pretrained('gpt2') model = GPT2Model.from_pretrained('gpt2') input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute", add_special_tokens=True)).unsqueeze(0) # bs=1 outputs = model(input_ids) outputs_batch_0 = outputs # 0 -> first batch input_ids.shape, outputs_batch_0.shape
(torch.Size([1, 6]), torch.Size([1, 6, 768]))
Note how the out_hidden_states corresponds to the input_ids, word each input word we add the extra hidden dimension.
The LMHead part is there to extract the information from the hidden states and to convert them to output tokens.