BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, see this https://arxiv.org/abs/1810.04805.
I used transformers, see this video: https://www.youtube.com/watch?v=SZorAJ4I-sA and this: https://arxiv.org/pdf/2003.08271.pdf.
See this colab notebook with examples at my GitHub account.