Large language model in the context of Self-supervised learning


Large language model in the context of Self-supervised learning

Large language model Study page number 1 of 2

Play TriviaQuestions Online!

or

Skip to study material about Large language model in the context of "Self-supervised learning"


⭐ Core Definition: Large language model

A large language model (LLM) is a language model trained with self-supervised machine learning on a vast amount of text, designed for natural language processing tasks, especially language generation. The largest and most capable LLMs are generative pre-trained transformers (GPTs) and provide the core capabilities of modern chatbots. LLMs can be fine-tuned for specific tasks or guided by prompt engineering. These models acquire predictive power regarding syntax, semantics, and ontologies inherent in human language corpora, but they also inherit inaccuracies and biases present in the data they are trained on.

They consist of billions to trillions of parameters and operate as general-purpose sequence models, generating, summarizing, translating, and reasoning over text. LLMs represent a significant new technology in their ability to generalize across tasks with minimal task-specific supervision, enabling capabilities like conversational agents, code generation, knowledge retrieval, and automated reasoning that previously required bespoke systems.

↓ Menu
HINT:

In this Dossier

Large language model in the context of Generative literature

Generative literature is poetry or fiction that is automatically generated, often using computers. It is a genre of electronic literature, and also related to generative art.

John Clark's Latin Verse Machine (1830–1843) is probably the first example of mechanised generative literature, while Christopher Strachey's love letter generator (1952) is the first digital example. With the large language models (LLMs) of the 2020s, generative literature is becoming increasingly common.

View the full Wikipedia page for Generative literature
↑ Return to Menu

Large language model in the context of Generative artificial intelligence

Generative artificial intelligence (Generative AI, or GenAI) is a subfield of artificial intelligence that uses generative models to produce text, images, videos, audio, software code or other forms of data. These models learn the underlying patterns and structures of their training data and use them to produce new data based on the input, which often comes in the form of natural language prompts.

Generative AI tools have become more common since the AI boom in the 2020s. This boom was made possible by improvements in transformer-based deep neural networks, particularly large language models (LLMs). Major tools include chatbots such as ChatGPT, Copilot, Gemini, Claude, Grok, and DeepSeek; text-to-image models such as Stable Diffusion, Midjourney, and DALL-E; and text-to-video models such as Veo and Sora. Technology companies developing generative AI include OpenAI, xAI, Anthropic, Meta AI, Microsoft, Google, Mistral AI, DeepSeek, Baidu and Yandex.

View the full Wikipedia page for Generative artificial intelligence
↑ Return to Menu

Large language model in the context of Language model

A language model is a model of the human brain's ability to produce natural language. Language models are useful for a variety of tasks, including speech recognition, machine translation, natural language generation (generating more human-like text), optical character recognition, route optimization, handwriting recognition, grammar induction, and information retrieval.

Large language models (LLMs), currently their most advanced form as of 2019, are predominantly based on transformers trained on larger datasets (frequently using texts scraped from the public internet). They have superseded recurrent neural network-based models, which had previously superseded the purely statistical models, such as the word n-gram language model.

View the full Wikipedia page for Language model
↑ Return to Menu

Large language model in the context of Chatbot

A chatbot (originally chatterbot) is a software application or web interface designed to have textual or spoken conversations. Modern chatbots are typically online and use generative artificial intelligence systems that are capable of maintaining a conversation with a user in natural language and simulating the way a human would behave as a conversational partner. Such chatbots often use deep learning and natural language processing, but simpler chatbots have existed for decades.

Chatbots have increased in popularity as part of the AI boom of the 2020s, and the popularity of ChatGPT, followed by competitors such as Gemini, Claude and later Grok. AI chatbots typically use a foundational large language model, such as GPT-4 or the Gemini language model, which is fine-tuned for specific uses.

View the full Wikipedia page for Chatbot
↑ Return to Menu

Large language model in the context of Prompt (natural language)

Prompt engineering is the process of structuring or crafting an instruction in order to produce better outputs from a generative artificial intelligence (AI) model. It typically involves designing clear queries, adding relevant context, and refining wording to guide the model toward more accurate, useful, and consistent responses.

A prompt is natural language text describing the task that an AI should perform. A prompt for a text-to-text language model can be a query, a command, or a longer statement including context, instructions, and conversation history. Prompt engineering may involve phrasing a query, specifying a style, choice of words and grammar, providing relevant context, or describing a character for the AI to mimic.

View the full Wikipedia page for Prompt (natural language)
↑ Return to Menu

Large language model in the context of AI boom

An AI boom is a period of rapid growth in the field of artificial intelligence (AI). The current boom is an ongoing period that originally started from 2010 to 2016, but saw increased acceleration in the 2020s. Examples of this include generative AI technologies, such as large language models and AI image generators developed by companies like OpenAI, as well as scientific advances, such as protein folding prediction led by Google DeepMind. This period is sometimes referred to as an AI spring, a term used to differentiate it from previous AI winters. As of 2025, ChatGPT has emerged as the 4th most visited website globally, surpassed only by Google, YouTube, and Facebook.

View the full Wikipedia page for AI boom
↑ Return to Menu

Large language model in the context of Transformer (machine learning model)

In deep learning, the transformer is an artificial neural network architecture based on the multi-head attention mechanism, in which text is converted to numerical representations called tokens, and each token is converted into a vector via lookup from a word embedding table. At each layer, each token is then contextualized within the scope of the context window with other (unmasked) tokens via a parallel multi-head attention mechanism, allowing the signal for key tokens to be amplified and less important tokens to be diminished.

Transformers have the advantage of having no recurrent units, therefore requiring less training time than earlier recurrent neural architectures (RNNs) such as long short-term memory (LSTM). Later variations have been widely adopted for training large language models (LLMs) on large (language) datasets.

View the full Wikipedia page for Transformer (machine learning model)
↑ Return to Menu

Large language model in the context of Microsoft Copilot

Microsoft Copilot is a generative artificial intelligence chatbot developed by Microsoft AI, a division of Microsoft. Based on OpenAI's GPT-4 and GPT-5 series of large language models, it was launched in 2023 as Microsoft's main replacement for the discontinued Cortana.

The service was introduced in February 2023 under the name Bing Chat, as a built-in feature for Microsoft Bing and Microsoft Edge. Over the course of 2023, Microsoft began to unify the Copilot branding across its various chatbot products, cementing the "copilot" analogy. At its Build 2023 conference, Microsoft announced its plans to integrate Copilot into Windows 11, allowing users to access it directly through the taskbar. In January 2024, a dedicated Copilot key was announced for Windows keyboards.

View the full Wikipedia page for Microsoft Copilot
↑ Return to Menu

Large language model in the context of Gemini (chatbot)

Gemini (formerly known as Bard) is a generative artificial intelligence chatbot and virtual assistant developed by Google. Based on the large language model (LLM) of the same name, it was launched on March 21, 2023 in response to the rise of OpenAI's ChatGPT.

View the full Wikipedia page for Gemini (chatbot)
↑ Return to Menu

Large language model in the context of Claude (language model)

Claude is a series of large language models developed by Anthropic. The first generation, Claude 1, was released in March 2023, and the latest, Claude Opus 4.5, in November 2025. The data for these models comes from sources such as Internet text, data from paid contractors, and Claude users.

View the full Wikipedia page for Claude (language model)
↑ Return to Menu

Large language model in the context of Grok (chatbot)

Grok is a generative artificial intelligence (generative AI) chatbot developed by xAI. It was launched in November 2023 by Elon Musk as an initiative based on the large language model (LLM) of the same name. Grok has apps for iOS and Android and is integrated with Twitter and Tesla's Optimus robot. The chatbot is named after the verb grok, coined by American author Robert A. Heinlein in his 1961 science fiction novel Stranger in a Strange Land to describe a deeper than human form of understanding.

The bot has generated various controversial responses, including conspiracy theories, Nazism, antisemitism, and praise of Adolf Hitler, as well as referring to Musk's views when asked about controversial topics or difficult decisions. Updates since 2023 have shifted the bot politically rightward to provide conservative responses to user queries.

View the full Wikipedia page for Grok (chatbot)
↑ Return to Menu

Large language model in the context of OpenAI

OpenAI is an American artificial intelligence (AI) organization headquartered in San Francisco, California. It aims to develop "safe and beneficial" artificial general intelligence (AGI), which it defines as "highly autonomous systems that outperform humans at most economically valuable work". As a leading organization in the ongoing AI boom, OpenAI is known for the GPT family of large language models, the DALL-E series of text-to-image models, and a text-to-video model named Sora. Its release of ChatGPT in November 2022 has been credited with catalyzing widespread interest in generative AI.

The organization has a complex corporate structure. As of October 2025, it is led by the non-profit OpenAI Foundation, founded in 2015 and registered in Delaware, which holds a 26% equity stake in OpenAI Group PBC, a for-profit public benefit corporation which commercializes its products. Microsoft invested over $13 billion into OpenAI, and provides Azure cloud computing resources. In October 2025, OpenAI conducted a $6.6 billion share sale that valued the company at $500 billion. On 28 October 2025, OpenAI said it had converted its main business into a for-profit corporation, with Microsoft acquiring a 27% stake in the company and the remaining non-profit company (now known as the OpenAI Foundation) owning a 26% stake.

View the full Wikipedia page for OpenAI
↑ Return to Menu

Large language model in the context of Anthropic

Anthropic PBC is an American artificial intelligence (AI) startup company founded in 2021. It has developed a family of large language models (LLMs) named Claude. The company researches and develops AI to "study their safety properties at the technological frontier" and use this research to deploy safe models for the public.

Anthropic was founded by former members of OpenAI, including siblings Daniela Amodei and Dario Amodei, who serve as president and CEO respectively. In September 2023, Amazon announced an investment of up to $4 billion. Google committed $2 billion the next month. As of November 2025, Anthropic is the third most valuable private company globally, valued at over $350 billion.

View the full Wikipedia page for Anthropic
↑ Return to Menu

Large language model in the context of Mistral AI

Mistral AI SAS (French: [mistʁal]) is a French artificial intelligence (AI) company, headquartered in Paris. Founded in 2023, it has open-weight large language models (LLMs), with both open-source and proprietary AI models. As of 2025 the company has a valuation of more than US$14 billion.

View the full Wikipedia page for Mistral AI
↑ Return to Menu

Large language model in the context of DeepSeek

Hangzhou DeepSeek Artificial Intelligence Basic Technology Research Co., Ltd., doing business as DeepSeek, is a Chinese artificial intelligence (AI) company that develops large language models (LLMs). Based in Hangzhou, Zhejiang, DeepSeek is owned and funded by the Chinese hedge fund High-Flyer. DeepSeek was founded in July 2023 by Liang Wenfeng, the co-founder of High-Flyer, who also serves as the CEO for both of the companies. The company launched an eponymous chatbot alongside its DeepSeek-R1 model in January 2025.

Released under the MIT License, DeepSeek-R1 provides responses comparable to other contemporary large language models, such as OpenAI's GPT-4 and o1. Its training cost was reported to be significantly lower than other LLMs. The company claims that it trained its V3 model for US$6 million—far less than the US$100 million cost for OpenAI's GPT-4 in 2023—and using approximately one-tenth the computing power consumed by Meta's comparable model, Llama 3.1. DeepSeek's success against larger and more established rivals has been described as "upending AI".

View the full Wikipedia page for DeepSeek
↑ Return to Menu

Large language model in the context of Semi-supervised learning

Weak supervision (also known as semi-supervised learning) is a paradigm in machine learning, the relevance and notability of which increased with the advent of large language models due to the large amount of data required to train them. It is characterized by using a combination of a small amount of human-labeled data (exclusively used in more expensive and time-consuming supervised learning paradigm), followed by a large amount of unlabeled data (used exclusively in unsupervised learning paradigm). In other words, the desired output values are provided only for a subset of the training data. The remaining data is unlabeled or imprecisely labeled. Intuitively, it can be seen as an exam and labeled data as sample problems that the teacher solves for the class as an aid in solving another set of problems. In the transductive setting, these unsolved problems act as exam questions. In the inductive setting, they become practice problems of the sort that will make up the exam.

View the full Wikipedia page for Semi-supervised learning
↑ Return to Menu

Large language model in the context of Multi-agent systems

A multi-agent system (MAS or "self-organized system") is a computerized system composed of multiple interacting intelligent agents. Multi-agent systems can solve problems that are difficult or impossible for an individual agent or a monolithic system to solve. Intelligence may include methodic, functional, procedural approaches, algorithmic search or reinforcement learning. With advancements in large language models (LLMs), LLM-based multi-agent systems have emerged as a new area of research, enabling more sophisticated interactions and coordination among agents.

Despite considerable overlap, a multi-agent system is not always the same as an agent-based model (ABM). The goal of an ABM is to search for explanatory insight into the collective behavior of agents (which do not necessarily need to be "intelligent") obeying simple rules, typically in natural systems, rather than in solving specific practical or engineering problems. The terminology of ABM tends to be used more often in the science, and MAS in engineering and technology. Applications where multi-agent systems research may deliver an appropriate approach include online trading, disaster response, target surveillance and social structure modelling.

View the full Wikipedia page for Multi-agent systems
↑ Return to Menu