Transformers - The core AI architecture that uses vectors to make a prediction about which token to generate next for the prompt. The predictions is based on the probability as to what is likely to come next. Your text prompt is combined with the training data and parameters to create a new mix of text. Transformers will analyze all the words in a given body of text at the same time rather than working word by word in sequence. Previously, recurrent neural networks (RNNs) processed data sequentially—one word at a time. And it did so in the order in which the words appear. The idea for transformers was first introduced in a 2017 Google research paper that discussed this deep learning architecture. The major AI models are built using these neural networks. A troubling downside to transformers is their need for ever increasing power demands. This is why some researchers are looking for alternatives like test-time training (TTT).

Becoming

AI Definitions: Transformers

Goforth Solutions, LLC