Cluster analysis in the context of "Machine learning"

Play Trivia Questions online!

or

Skip to study material about Cluster analysis in the context of "Machine learning"

Ad spacer

⭐ Core Definition: Cluster analysis

Cluster analysis, or clustering, is a data analysis technique aimed at partitioning a set of objects into groups such that objects within the same group (called a cluster) exhibit greater similarity to one another (in some specific sense defined by the analyst) than to those in other groups (clusters). It is a main task of exploratory data analysis, and a common technique for statistical data analysis, used in many fields, including pattern recognition, image analysis, information retrieval, bioinformatics, data compression, computer graphics and machine learning.

Cluster analysis refers to a family of algorithms and tasks rather than one specific algorithm. It can be achieved by various algorithms that differ significantly in their understanding of what constitutes a cluster and how to efficiently find them. Popular notions of clusters include groups with small distances between cluster members, dense areas of the data space, intervals or particular statistical distributions. Clustering can therefore be formulated as a multi-objective optimization problem. The appropriate clustering algorithm and parameter settings (including parameters such as the distance function to use, a density threshold or the number of expected clusters) depend on the individual data set and intended use of the results. Cluster analysis as such is not an automatic task, but an iterative process of knowledge discovery or interactive multi-objective optimization that involves trial and failure. It is often necessary to modify data preprocessing and model parameters until the result achieves the desired properties.

↓ Menu

>>>PUT SHARE BUTTONS HERE<<<
In this Dossier

Cluster analysis in the context of Categorization

Classification is the activity of assigning objects to some pre-existing classes or categories. This is distinct from the task of establishing the classes themselves (for example through cluster analysis). Examples include diagnostic tests, identifying spam emails and deciding whether to give someone a driving license.

As well as 'category', synonyms or near-synonyms for 'class' include 'type', 'species', 'forms', 'order', 'concept', 'taxon', 'group', 'identification' and 'division'.

↑ Return to Menu

Cluster analysis in the context of Large-scale brain networks

Large-scale brain networks (also known as intrinsic brain networks) are collections of widespread brain regions showing functional connectivity by statistical analysis of the fMRI BOLD signal or other recording methods such as EEG, PET and MEG. An emerging paradigm in neuroscience is that cognitive tasks are performed not by individual brain regions working in isolation but by networks consisting of several discrete brain regions that are said to be "functionally connected". Functional connectivity networks may be found using algorithms such as cluster analysis, spatial independent component analysis (ICA), seed based, and others. Synchronized brain regions may also be identified using long-range synchronization of the EEG, MEG, or other dynamic brain signals.

The set of identified brain areas that are linked together in a large-scale network varies with cognitive function. When the cognitive state is not explicit (i.e., the subject is at "rest"), the large-scale brain network is a resting state network (RSN). As a physical system with graph-like properties, a large-scale brain network has both nodes and edges and cannot be identified simply by the co-activation of brain areas. In recent decades, the analysis of brain networks was made feasible by advances in imaging techniques as well as new tools from graph theory and dynamical systems.

↑ Return to Menu

Cluster analysis in the context of Genetic history of the Middle East

The genetic history of the Middle East is the subject of research within the fields of human population genomics, archaeogenetics and Middle Eastern studies. Researchers may use Y-DNA, mtDNA, other autosomal DNA, whole genome, or whole exome information to identify the genetic history of ancient and modern populations of Arabia, Egypt, the Levant, Mesopotamia, Persia, Turkey, and other areas.

↑ Return to Menu

Cluster analysis in the context of Simpson's paradox

Simpson's paradox is a phenomenon in probability and statistics in which a trend appears in several groups of data but disappears or reverses when the groups are combined. This result is often encountered in social-science and medical-science statistics, and is particularly problematic when frequency data are unduly given causal interpretations. The paradox can be resolved when confounding variables and causal relations are appropriately addressed in the statistical modeling (e.g., through cluster analysis).

Simpson's paradox has been used to illustrate the kind of misleading results that the misuse of statistics can generate.

↑ Return to Menu

Cluster analysis in the context of Dimensionality reduction

Dimensionality reduction, or dimension reduction, is the transformation of data from a high-dimensional space into a low-dimensional space so that the low-dimensional representation retains some meaningful properties of the original data, ideally close to its intrinsic dimension. Working in high-dimensional spaces can be undesirable for many reasons; raw data are often sparse as a consequence of the curse of dimensionality, and analyzing the data is usually computationally intractable. Dimensionality reduction is common in fields that deal with large numbers of observations and/or large numbers of variables, such as signal processing, speech recognition, neuroinformatics, and bioinformatics.

Methods are commonly divided into linear and nonlinear approaches. Linear approaches can be further divided into feature selection and feature extraction. Dimensionality reduction can be used for noise reduction, data visualization, cluster analysis, or as an intermediate step to facilitate other analyses.

↑ Return to Menu