Deep learning in the context of Representation learning


Deep learning in the context of Representation learning

Deep learning Study page number 1 of 1

Play TriviaQuestions Online!

or

Skip to study material about Deep learning in the context of "Representation learning"


⭐ Core Definition: Deep learning

In machine learning, deep learning focuses on utilizing multilayered neural networks to perform tasks such as classification, regression, and representation learning. The field takes inspiration from biological neuroscience and revolves around stacking artificial neurons into layers and "training" them to process data. The adjective "deep" refers to the use of multiple layers (ranging from three to several hundred or thousands) in the network. Methods used can be supervised, semi-supervised or unsupervised.

Some common deep learning network architectures include fully connected networks, deep belief networks, recurrent neural networks, convolutional neural networks, generative adversarial networks, transformers, and neural radiance fields. These architectures have been applied to fields including computer vision, speech recognition, natural language processing, machine translation, bioinformatics, drug design, medical image analysis, climate science, material inspection and board game programs, where they have produced results comparable to and in some cases surpassing human expert performance.

↓ Menu
HINT:

In this Dossier

Deep learning in the context of Machine learning

Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of statistical algorithms that can learn from data and generalise to unseen data, and thus perform tasks without explicit instructions. Within a subdiscipline in machine learning, advances in the field of deep learning have allowed neural networks, a class of statistical algorithms, to surpass many previous machine learning approaches in performance.

ML finds application in many fields, including natural language processing, computer vision, speech recognition, email filtering, agriculture, and medicine. The application of ML to business problems is known as predictive analytics.

View the full Wikipedia page for Machine learning
↑ Return to Menu

Deep learning in the context of Generative artificial intelligence

Generative artificial intelligence (Generative AI, or GenAI) is a subfield of artificial intelligence that uses generative models to produce text, images, videos, audio, software code or other forms of data. These models learn the underlying patterns and structures of their training data and use them to produce new data based on the input, which often comes in the form of natural language prompts.

Generative AI tools have become more common since the AI boom in the 2020s. This boom was made possible by improvements in transformer-based deep neural networks, particularly large language models (LLMs). Major tools include chatbots such as ChatGPT, Copilot, Gemini, Claude, Grok, and DeepSeek; text-to-image models such as Stable Diffusion, Midjourney, and DALL-E; and text-to-video models such as Veo and Sora. Technology companies developing generative AI include OpenAI, xAI, Anthropic, Meta AI, Microsoft, Google, Mistral AI, DeepSeek, Baidu and Yandex.

View the full Wikipedia page for Generative artificial intelligence
↑ Return to Menu

Deep learning in the context of Chatbot

A chatbot (originally chatterbot) is a software application or web interface designed to have textual or spoken conversations. Modern chatbots are typically online and use generative artificial intelligence systems that are capable of maintaining a conversation with a user in natural language and simulating the way a human would behave as a conversational partner. Such chatbots often use deep learning and natural language processing, but simpler chatbots have existed for decades.

Chatbots have increased in popularity as part of the AI boom of the 2020s, and the popularity of ChatGPT, followed by competitors such as Gemini, Claude and later Grok. AI chatbots typically use a foundational large language model, such as GPT-4 or the Gemini language model, which is fine-tuned for specific uses.

View the full Wikipedia page for Chatbot
↑ Return to Menu

Deep learning in the context of Transformer (machine learning model)

In deep learning, the transformer is an artificial neural network architecture based on the multi-head attention mechanism, in which text is converted to numerical representations called tokens, and each token is converted into a vector via lookup from a word embedding table. At each layer, each token is then contextualized within the scope of the context window with other (unmasked) tokens via a parallel multi-head attention mechanism, allowing the signal for key tokens to be amplified and less important tokens to be diminished.

Transformers have the advantage of having no recurrent units, therefore requiring less training time than earlier recurrent neural architectures (RNNs) such as long short-term memory (LSTM). Later variations have been widely adopted for training large language models (LLMs) on large (language) datasets.

View the full Wikipedia page for Transformer (machine learning model)
↑ Return to Menu

Deep learning in the context of Text-to-image

A text-to-image model (T2I or TTI model) is a machine learning model which takes an input natural language prompt and produces an image matching that description.

Text-to-image models began to be developed in the mid-2010s during the beginnings of the AI boom, as a result of advances in deep neural networks. In 2022, the output of state-of-the-art text-to-image models—such as OpenAI's DALL-E 2, Google Brain's Imagen, Stability AI's Stable Diffusion, Midjourney, and Runway's Gen-4—began to be considered to approach the quality of real photographs and human-drawn art.

View the full Wikipedia page for Text-to-image
↑ Return to Menu

Deep learning in the context of Stable Diffusion

Stable Diffusion is a deep learning, text-to-image model released in 2022 based on diffusion techniques. The generative artificial intelligence technology is the premier product of Stability AI and is considered to be a part of the ongoing artificial intelligence boom.

It is primarily used to generate detailed images conditioned on text descriptions, though it can also be applied to other tasks such as inpainting, outpainting, and generating image-to-image translations guided by a text prompt. Its development involved researchers from the CompVis Group at Ludwig Maximilian University of Munich and Runway with a computational donation from Stability and training data from non-profit organizations.

View the full Wikipedia page for Stable Diffusion
↑ Return to Menu

Deep learning in the context of DALL-E

DALL-E, DALL-E 2, and DALL-E 3 (stylised DALL·E) are text-to-image models developed by OpenAI using deep learning methodologies to generate digital images from natural language descriptions known as prompts.

The first version of DALL-E was announced in January 2021. In the following year, its successor DALL-E 2 was released. DALL-E 3 was released natively into ChatGPT for ChatGPT Plus and ChatGPT Enterprise customers in October 2023, with availability via OpenAI's API and "Labs" platform provided in early November. Microsoft implemented the model in Bing's Image Creator tool and plans to implement it into their Designer app. With Bing's Image Creator tool, Microsoft Copilot runs on DALL-E 3. In March 2025, DALL-E-3 was replaced in ChatGPT by GPT Image 1's native image-generation capabilities.

View the full Wikipedia page for DALL-E
↑ Return to Menu

Deep learning in the context of Convolutional neural networks

A convolutional neural network (CNN) is a type of feedforward neural network that learns features via filter (or kernel) optimization. This type of deep learning network has been applied to process and make predictions from many different types of data including text, images and audio. CNNs are the de-facto standard in deep learning-based approaches to computer vision and image processing, and have only recently been replaced—in some cases—by newer deep learning architectures such as the transformer.

Vanishing gradients and exploding gradients, seen during backpropagation in earlier neural networks, are prevented by the regularization that comes from using shared weights over fewer connections. For example, for each neuron in the fully-connected layer, 10,000 weights would be required for processing an image sized 100 × 100 pixels. However, applying cascaded convolution (or cross-correlation) kernels, only 25 weights for each convolutional layer are required to process 5x5-sized tiles. Higher-layer features are extracted from wider context windows, compared to lower-layer features.

View the full Wikipedia page for Convolutional neural networks
↑ Return to Menu

Deep learning in the context of Google Translate

Google Translate is a multilingual neural machine translation service developed by Google to translate text, documents and websites from one language into another. It offers a website interface, a mobile app for Android and iOS, and an API that helps developers build browser extensions and software applications. As of December 2025, Google Translate supports 249 languages and language varieties at various levels. It served over 200 million people daily in May 2013, and over 500 million total users as of April 2016, with more than 100 billion words translated daily.

Launched in April 2006 as a statistical machine translation service, it originally used United Nations and European Parliament documents and transcripts to gather linguistic data. Rather than translating languages directly, it first translated text to English and then pivoted to the target language in most of the language combinations it posited in its grid, with a few exceptions including Catalan–Spanish. During a translation, it looked for patterns in millions of documents to help decide which words to choose and how to arrange them in the target language. In recent years, it has used a deep learning model to power its translations. Its accuracy, which has been criticized on several occasions, has been measured to vary greatly across languages. In November 2016, Google announced that Google Translate would switch to a neural machine translation engine – Google Neural Machine Translation (GNMT) – which translated "whole sentences at a time, rather than just piece by piece. It uses this broader context to help it figure out the most relevant translation, which it then rearranges and adjusts to be more like a human speaking with proper grammar".

View the full Wikipedia page for Google Translate
↑ Return to Menu

Deep learning in the context of Generative pre-trained transformer

A generative pre-trained transformer (GPT) is a type of large language model (LLM) that is widely used in generative AI chatbots. GPTs are based on a deep learning architecture called the transformer. They are pre-trained on large datasets of unlabeled content, and able to generate novel content.

OpenAI was the first to apply generative pre-training to the transformer architecture, introducing the GPT-1 model in 2018. The company has since released many bigger GPT models. The popular chatbot ChatGPT, released in late 2022 (using GPT-3.5), was followed by many competitor chatbots using their own generative pre-trained transformers to generate text, such as Gemini, DeepSeek or Claude.

View the full Wikipedia page for Generative pre-trained transformer
↑ Return to Menu

Deep learning in the context of Fine-tuning (deep learning)

Fine-tuning (in deep learning) is the process of adapting a model trained for one task (the upstream task) to perform a different, usually more specific, task (the downstream task). It is considered a form of transfer learning, as it reuses knowledge learned from the original training objective.

Fine-tuning involves applying additional training (e.g., on new data) to the parameters of a neural network that have been pre-trained. Many variants exist. The additional training can be applied to the entire neural network, or to only a subset of its layers, in which case the layers that are not being fine-tuned are "frozen" (i.e., not changed during backpropagation). A model may also be augmented with "adapters"—lightweight modules inserted into the model's architecture that nudge the embedding space for domain adaptation. These contain far fewer parameters than the original model and can be fine-tuned in a parameter-efficient way by tuning only their weights and leaving the rest of the model's weights frozen.

View the full Wikipedia page for Fine-tuning (deep learning)
↑ Return to Menu

Deep learning in the context of Text-to-image model

A text-to-image (T2I or TTI) model is a machine learning model which takes an input natural language prompt and produces an image matching that description.

Text-to-image models gradually began to be developed in the mid-2010s during the beginnings of the AI boom, as a result of advances in deep neural networks. In 2022, the output of state-of-the-art text-to-image models—such as OpenAI's DALL-E 2, Google Brain's Imagen, Stability AI's Stable Diffusion, Midjourney, and Runway's Gen-4—began to be considered to approach the quality of real photographs and human-drawn art.

View the full Wikipedia page for Text-to-image model
↑ Return to Menu

Deep learning in the context of Foundation models

In artificial intelligence, a foundation model (FM), also known as large x model (LxM, where "x" is a variable representing any text, image, sound, etc.), is a machine learning or deep learning model trained on vast datasets so that it can be applied across a wide range of use cases. Generative AI applications like large language models (LLM) are common examples of foundation models.

Building foundation models is often highly resource-intensive, with the most advanced models costing hundreds of millions of dollars to cover the expenses of acquiring, curating, and processing massive datasets, as well as the compute power required for training. These costs stem from the need for sophisticated infrastructure, extended training times, and advanced hardware, such as GPUs. In contrast, adapting an existing foundation model for a specific task or using it directly is far less costly, as it leverages pre-trained capabilities and typically requires only fine-tuning on smaller, task-specific datasets.

View the full Wikipedia page for Foundation models
↑ Return to Menu