Transformer (machine learning) in the context of Training, validation, and test data sets

Transformer (machine learning) in the context of Training, validation, and test data sets

Transformer (machine learning) Study page number 1 of 1

Play TriviaQuestions Online!

Skip to study material about Transformer (machine learning) in the context of "Training, validation, and test data sets"

⭐ Core Definition: Transformer (machine learning)

In deep learning, the transformer is an artificial neural network architecture based on the multi-head attention mechanism, in which text is converted to numerical representations called tokens, and each token is converted into a vector via lookup from a word embedding table. At each layer, each token is then contextualized within the scope of the context window with other (unmasked) tokens via a parallel multi-head attention mechanism, allowing the signal for key tokens to be amplified and less important tokens to be diminished.

Transformers have the advantage of having no recurrent units, therefore requiring less training time than earlier recurrent neural architectures (RNNs) such as long short-term memory (LSTM). Later variations have been widely adopted for training large language models (LLMs) on large (language) datasets.

↓ Menu

HINT:

In this Dossier

⭐ Core Definition: Transformer (machine learning)
Transformer (machine learning) in the context of Language model

Transformer (machine learning) in the context of Language model

A language model is a model of the human brain's ability to produce natural language. Language models are useful for a variety of tasks, including speech recognition, machine translation, natural language generation (generating more human-like text), optical character recognition, route optimization, handwriting recognition, grammar induction, and information retrieval.

Large language models (LLMs), currently their most advanced form as of 2019, are predominantly based on transformers trained on larger datasets (frequently using texts scraped from the public internet). They have superseded recurrent neural network-based models, which had previously superseded the purely statistical models, such as the word n-gram language model.

View the full Wikipedia page for Language model

↑ Return to Menu