Member-only story
Transformers — Brief Review
The term ‘Transformer’ was first introduced in the paper ‘Attention is All You Need,’ published in 2017. Artificial intelligence models like ChatGPT, which are widely used today, are powered by Transformers. Instead of examining words one by one, Transformers analyze words in a sequence to capture the connections between them. Moreover, it is the first model to calculate input and output representations entirely using self-attention, without using sequential RNNs or convolutions. In this article, I will briefly discuss the basic building blocks of Transformers.
Word Embedding
Transformers are a type of artificial neural network, so they take specific inputs at the input layer. Of course, these inputs need to be converted from words to numbers. There are many ways to do this, but the most commonly used method for artificial neural networks is Word Embedding (word vectors). These vectors are used to help Transformers better understand textual data. The primary goal of word embedding is to have an input for each word and symbol in the vocabulary we want to use. Words and symbols in the vocabulary are referred to as tokens.
Positional Encoding
Even if the same words are used in sentences, the order in which words are used can change the meaning of the sentence. Transformers use the Positional Encoding…