Keyword: Transformer GPT BERT Related Papers: Attention Is All You Need BERT - Pre-training of Deep Bidirectional Transformers for Language Understanding Improving Language Understanding by Generative Pre-Training