Reinforcement learning in the context of Supervised learning


Reinforcement learning in the context of Supervised learning

Reinforcement learning Study page number 1 of 1

Play TriviaQuestions Online!

or

Skip to study material about Reinforcement learning in the context of "Supervised learning"


⭐ Core Definition: Reinforcement learning

In machine learning and optimal control, reinforcement learning (RL) is concerned with how an intelligent agent should take actions in a dynamic environment in order to maximize a reward signal. Reinforcement learning is one of the three basic machine learning paradigms, alongside supervised learning and unsupervised learning.

While supervised learning and unsupervised learning algorithms respectively attempt to discover patterns in labeled and unlabeled data, reinforcement learning involves training an agent through interactions with its environment. To learn to maximize rewards from these interactions, the agent makes decisions between trying new actions to learn more about the environment (exploration), or using current knowledge of the environment to take the best action (exploitation). The search for the optimal balance between these two strategies is known as the exploration–exploitation dilemma.

↓ Menu
HINT:

In this Dossier

Reinforcement learning in the context of Automated planning and scheduling

Automated planning and scheduling, sometimes denoted as simply AI planning, is a branch of artificial intelligence that concerns the realization of strategies or action sequences, typically for execution by intelligent agents, autonomous robots and unmanned vehicles. Unlike classical control and classification problems, the solutions are complex and must be discovered and optimized in multidimensional space. Planning is also related to decision theory.

In known environments with available models, planning can be done offline. Solutions can be found and evaluated prior to execution. In dynamically unknown environments, the strategy often needs to be revised online. Models and policies must be adapted. Solutions usually resort to iterative trial and error processes commonly seen in artificial intelligence. These include dynamic programming, reinforcement learning and combinatorial optimization. Languages used to describe planning and scheduling are often called action languages.

View the full Wikipedia page for Automated planning and scheduling
↑ Return to Menu

Reinforcement learning in the context of Automimicry

In zoology, automimicry, Browerian mimicry, or intraspecific mimicry, is a form of mimicry in which the same species of animal is imitated. There are two different forms.

In one form, first described by Lincoln Brower in 1967, weakly-defended members of a species with warning coloration are parasitic on more strongly-defended members of their species, mimicking them to provide the negative reinforcement learning required for warning signals to function. The mechanism, analogous to Batesian mimicry, is found in insects such as the monarch butterfly.

View the full Wikipedia page for Automimicry
↑ Return to Menu

Reinforcement learning in the context of Generative adversarial network

A generative adversarial network (GAN) is a class of machine learning frameworks and a prominent framework for approaching generative artificial intelligence. The concept was initially developed by Ian Goodfellow and his colleagues in June 2014. In a GAN, two neural networks compete with each other in the form of a zero-sum game, where one agent's gain is another agent's loss.

Given a training set, this technique learns to generate new data with the same statistics as the training set. For example, a GAN trained on photographs can generate new photographs that look at least superficially authentic to human observers, having many realistic characteristics. Though originally proposed as a form of generative model for unsupervised learning, GANs have also proved useful for semi-supervised learning, fully supervised learning, and reinforcement learning.

View the full Wikipedia page for Generative adversarial network
↑ Return to Menu

Reinforcement learning in the context of Multi-agent systems

A multi-agent system (MAS or "self-organized system") is a computerized system composed of multiple interacting intelligent agents. Multi-agent systems can solve problems that are difficult or impossible for an individual agent or a monolithic system to solve. Intelligence may include methodic, functional, procedural approaches, algorithmic search or reinforcement learning. With advancements in large language models (LLMs), LLM-based multi-agent systems have emerged as a new area of research, enabling more sophisticated interactions and coordination among agents.

Despite considerable overlap, a multi-agent system is not always the same as an agent-based model (ABM). The goal of an ABM is to search for explanatory insight into the collective behavior of agents (which do not necessarily need to be "intelligent") obeying simple rules, typically in natural systems, rather than in solving specific practical or engineering problems. The terminology of ABM tends to be used more often in the science, and MAS in engineering and technology. Applications where multi-agent systems research may deliver an appropriate approach include online trading, disaster response, target surveillance and social structure modelling.

View the full Wikipedia page for Multi-agent systems
↑ Return to Menu

Reinforcement learning in the context of Predictive learning

Predictive learning is a machine learning (ML) technique where an artificial intelligence model is fed new data to develop an understanding of its environment, capabilities, and limitations. This technique finds application in many areas, including neuroscience, business, robotics, and computer vision. This concept was developed and expanded by French computer scientist Yann LeCun in 1988 during his career at Bell Labs, where he trained models to detect handwriting so that financial companies could automate check processing.

The mathematical foundation for predictive learning dates back to the 17th century, where British insurance company Lloyd's used predictive analytics to make a profit. Starting out as a mathematical concept, this method expanded the possibilities of artificial intelligence. Predictive learning is an attempt to learn with a minimum of pre-existing mental structure. It was inspired by Jean Piaget's account of children constructing knowledge of the world through interaction. Gary Drescher's book Made-up Minds was crucial to the development of this concept.

View the full Wikipedia page for Predictive learning
↑ Return to Menu

Reinforcement learning in the context of Andrew Barto

Andrew Gehret Barto (born 1948) is an American computer scientist who is professor emeritus of computer science at the University of Massachusetts Amherst. Barto is best known for his foundational contributions to the field of modern computational reinforcement learning.

View the full Wikipedia page for Andrew Barto
↑ Return to Menu

Reinforcement learning in the context of Richard S. Sutton

Richard Stuart Sutton FRS FRSC (born 1957 or 1958) is a Canadian computer scientist. He is a professor of computing science at the University of Alberta, fellow & Chief Scientific Advisor at the Alberta Machine Intelligence Institute, and a research scientist at Keen Technologies. Sutton is considered one of the founders of modern computational reinforcement learning. In particular, he contributed to temporal difference learning and policy gradient methods. He received the 2024 Turing Award with Andrew Barto.

View the full Wikipedia page for Richard S. Sutton
↑ Return to Menu

Reinforcement learning in the context of Focused crawler

A focused crawler is a web crawler that collects Web pages that satisfy some specific property, by carefully prioritizing the crawl frontier and managing the hyperlink exploration process. Some predicates may be based on simple, deterministic and surface properties. For example, a crawler's mission may be to crawl pages from only the .jp domain. Other predicates may be softer or comparative, e.g., "crawl pages about baseball", or "crawl pages with large PageRank". An important page property pertains to topics, leading to 'topical crawlers'. For example, a topical crawler may be deployed to collect pages about solar power, swine flu, or even more abstract concepts like controversy while minimizing resources spent fetching pages on other topics. Crawl frontier management may not be the only device used by focused crawlers; they may use a Web directory, a Web text index, backlinks, or any other Web artifact.

A focused crawler must predict the probability that an unvisited page will be relevant before actually downloading the page. A possible predictor is the anchor text of links; this was the approach taken by Pinkerton in a crawler developed in the early days of the Web. Topical crawling was first introduced by Filippo Menczer. Chakrabarti et al. coined the term 'focused crawler' and used a text classifier to prioritize the crawl frontier. Andrew McCallum and co-authors also used reinforcement learning to focus crawlers. Diligenti et al. traced the context graph leading up to relevant pages, and their text content, to train classifiers. A form of online reinforcement learning has been used, along with features extracted from the DOM tree and text of linking pages, to continually train classifiers that guide the crawl. In a review of topical crawling algorithms, Menczer et al. show that such simple strategies are very effective for short crawls, while more sophisticated techniques such as reinforcement learning and evolutionary adaptation can give the best performance over longer crawls. It has been shown that spatialinformation is important to classify Web documents.

View the full Wikipedia page for Focused crawler
↑ Return to Menu