Principal component analysis of BERT's static embeddings
BERT’s word embeddings
BERT is one of the seminal works in natural language processing (NLP). A team of Google researchers trained BERT to encode sequences of text tokens (sub-words) into useful numerical representations. These representations could then be used for many important NLP tasks, like question answering or logical inference.
Models like BERT build on earlier work such as word2vec and GloVe, which are algorithms for learning numerical vectors as representations of words.
In this notebook, I look at the static word embeddings learned by BERT. That is, BERT has learned a vector for every single word it knows (the bottom layer of the model, before being passed through the encoder layers to produce contextualized representations) - where do these vectors live in relation to each other?
Figures
Below are a few projections of BERT’s static embedding space.
Notebook
Below is a notebook you can use to visualize them yourself.
Download this Jupyter notebook
Enjoy Reading This Article?
Here are some more articles you might like to read next: