Sitemap

Member-only story

Transformers — Brief Review

5 min readOct 24, 2023
Created with DALL-E

The term ‘Transformer’ was first introduced in the paper ‘Attention is All You Need,’ published in 2017. Artificial intelligence models like ChatGPT, which are widely used today, are powered by Transformers. Instead of examining words one by one, Transformers analyze words in a sequence to capture the connections between them. Moreover, it is the first model to calculate input and output representations entirely using self-attention, without using sequential RNNs or convolutions. In this article, I will briefly discuss the basic building blocks of Transformers.

Word Embedding

Transformers are a type of artificial neural network, so they take specific inputs at the input layer. Of course, these inputs need to be converted from words to numbers. There are many ways to do this, but the most commonly used method for artificial neural networks is Word Embedding (word vectors). These vectors are used to help Transformers better understand textual data. The primary goal of word embedding is to have an input for each word and symbol in the vocabulary we want to use. Words and symbols in the vocabulary are referred to as tokens.

Positional Encoding

Even if the same words are used in sentences, the order in which words are used can change the meaning of the sentence. Transformers use the Positional Encoding…

--

--

Buse Şenol
Buse Şenol

Written by Buse Şenol

BAU Software Engineering | Data Scientist | The AI Lens Editor | https://www.linkedin.com/in/busesenoll/

Responses (1)