Outlier in the context of "Michelson–Morley experiment"

Play Trivia Questions online!

or

Skip to study material about Outlier in the context of "Michelson–Morley experiment"

Ad spacer

⭐ Core Definition: Outlier

In statistics, an outlier is a data point that differs significantly from other observations. An outlier may be due to a variability in the measurement, an indication of novel data, or it may be the result of experimental error; the latter are sometimes excluded from the data set. An outlier can be an indication of exciting possibility, but can also cause serious problems in statistical analyses.

Outliers can occur by chance in any distribution, but they can indicate novel behaviour or structures in the data-set, measurement error, or that the population has a heavy-tailed distribution. In the case of measurement error, one wishes to discard them or use statistics that are robust to outliers, while in the case of heavy-tailed distributions, they indicate that the distribution has high skewness and that one should be very cautious in using tools or intuitions that assume a normal distribution. A frequent cause of outliers is a mixture of two distributions, which may be two distinct sub-populations, or may indicate 'correct trial' versus 'measurement error'; this is modeled by a mixture model.

↓ Menu

>>>PUT SHARE BUTTONS HERE<<<
In this Dossier

Outlier in the context of Data

Data (/ˈdtə/ DAY-tə, US also /ˈdætə/ DAT) are a collection of discrete or continuous values that convey information, describing the quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted formally. A datum is an individual value in a collection of data. Data are usually organized into structures such as tables that provide additional context and meaning, and may themselves be used as data in larger structures. Data may be used as variables in a computational process. Data may represent abstract ideas or concrete measurements.Data are commonly used in scientific research, economics, and virtually every other form of human organizational activity. Examples of data sets include price indices (such as the consumer price index), unemployment rates, literacy rates, and census data. In this context, data represent the raw facts and figures from which useful information can be extracted.

Data are collected using techniques such as measurement, observation, query, or analysis, and are typically represented as numbers or characters that may be further processed. Field data are data that are collected in an uncontrolled, in-situ environment. Experimental data are data that are generated in the course of a controlled scientific experiment. Data are analyzed using techniques such as calculation, reasoning, discussion, presentation, visualization, or other forms of post-analysis. Prior to analysis, raw data (or unprocessed data) is typically cleaned: Outliers are removed, and obvious instrument or data entry errors are corrected.

↑ Return to Menu

Outlier in the context of Robust statistics

Robust statistics are statistics that maintain their properties even if the underlying distributional assumptions are incorrect. Robust statistical methods have been developed for many common problems, such as estimating location, scale, and regression parameters. One motivation is to produce statistical methods that are not unduly affected by outliers. Another motivation is to provide methods with good performance when there are small departures from a parametric distribution. For example, robust methods work well for mixtures of two normal distributions with different standard deviations; under this model, non-robust methods like a t-test work poorly.

↑ Return to Menu

Outlier in the context of Standard deviation

In statistics, the standard deviation is a measure of the amount of variation of the values of a variable about its mean. A low standard deviation indicates that the values tend to be close to the mean (also called the expected value) of the set, while a high standard deviation indicates that the values are spread out over a wider range. The standard deviation is commonly used in the determination of what constitutes an outlier and what does not. Standard deviation may be abbreviated SD or std dev, and is most commonly represented in mathematical texts and equations by the lowercase Greek letter σ (sigma), for the population standard deviation, or the Latin letter s, for the sample standard deviation.

The standard deviation of a random variable, sample, statistical population, data set, or probability distribution is the square root of its variance. (For a finite population, variance is the average of the squared deviations from the mean.) A useful property of the standard deviation is that, unlike the variance, it is expressed in the same unit as the data. Standard deviation can also be used to calculate standard error for a finite sample, and to determine statistical significance.

↑ Return to Menu

Outlier in the context of Average

An average of a collection or group is a value that is most central or most common in some sense, and represents its overall position.

In mathematics, especially in colloquial usage, it most commonly refers to the arithmetic mean, so the "average" of the list of numbers [2, 3, 4, 7, 9] is generally considered to be (2+3+4+7+9)/5 = 25/5 = 5. In situations where the data is skewed or has outliers, and it is desired to focus on the main part of the group rather than the long tail, "average" often instead refers to the median; for example, the average personal income is usually given as the median income, so that it represents the majority of the population rather than being overly influenced by the much higher incomes of the few rich people. In certain real-world scenarios, such computing the average speed from multiple measurements taken over the same distance, the average used is the harmonic mean. In situations where a histogram or probability density function is being referenced, the "average" could instead refer to the mode. Other statistics that can be used as an average include the mid-range and geometric mean, but they would rarely, if ever, be colloquially referred to as "the average".

↑ Return to Menu

Outlier in the context of Raw data

Raw data, also known as primary data, are data (e.g., numbers, instrument readings, figures, etc.) collected from a source. In the context of examinations, the raw data might be described as a raw score (after test scores).

If a scientist sets up a computerized thermometer which records the temperature of a chemical mixture in a test tube every minute, the list of temperature readings for every minute, as printed out on a spreadsheet or viewed on a computer screen are "raw data". Raw data have not been subjected to processing, "cleaning" by researchers to remove outliers, obvious instrument reading errors or data entry errors, or any analysis (e.g., determining central tendency aspects such as the average or median result). As well, raw data have not been subject to any other manipulation by a software program or a human researcher, analyst or technician. They are also referred to as primary data. Raw data is a relative term (see data), because even once raw data have been "cleaned" and processed by one team of researchers, another team may consider these processed data to be "raw data" for another stage of research. Raw data can be inputted to a computer program or used in manual procedures such as analyzing statistics from a survey. The term "raw data" can refer to the binary data on electronic storage devices, such as hard disk drives (also referred to as "low-level data").

↑ Return to Menu

Outlier in the context of Quartile

In statistics, quartiles are a type of quantiles which divide the number of data points into four parts, or quarters, of more-or-less equal size. The data must be ordered from smallest to largest to compute quartiles; as such, quartiles are a form of order statistic. The three quartiles, resulting in four data divisions, are as follows:

Along with the minimum and maximum of the data (which are also quartiles), the three quartiles described above provide a five-number summary of the data. This summary is important in statistics because it provides information about both the center and the spread of the data. Knowing the lower and upper quartile provides information on how big the spread is and if the dataset is skewed toward one side. Since quartiles divide the number of data points evenly, the range is generally not the same between adjacent quartiles (i.e. usually (Q3 - Q2) ≠ (Q2 - Q1)). Interquartile range (IQR) is defined as the difference between the 75th and 25th percentiles or Q3 - Q1. While the maximum and minimum also show the spread of the data, the upper and lower quartiles can provide more detailed information on the location of specific data points, the presence of outliers in the data, and the difference in spread between the middle 50% of the data and the outer data points.

↑ Return to Menu

Outlier in the context of Kurtosis

Kurtosis (from Greek: κυρτός (kyrtos or kurtos), meaning 'curved, arching') refers to the degree of tailedness in the probability distribution of a real-valued, random variable in probability theory and statistics. Similar to skewness, kurtosis provides insight into specific characteristics of a distribution. Various methods exist for quantifying kurtosis in theoretical distributions, and corresponding techniques allow estimation based on sample data from a population. It is important to note that different measures of kurtosis can yield varying interpretations.

The standard measure of a distribution's kurtosis, originating with Karl Pearson, is a scaled version of the fourth moment of the distribution. This number is related to the tails of the distribution, not its peak; hence, the sometimes-seen characterization of kurtosis as peakedness is incorrect. For this measure, higher kurtosis corresponds to greater extremity of deviations (or outliers), and not the configuration of data near the mean.

↑ Return to Menu