33. Normal Distribution

To understand Normal Distribution, one first needs to look at Central Limit Theorem.

Central Limit Theorem:

In probability theory, the central limit theorem (CLT) states that the distribution of a sample will approximate a normal distribution (i.e., a bell curve) as the sample size becomes larger, regardless of the population's actual distribution shape.

Put another way, CLT is a statistical premise that, given a sufficiently large sample size from a population with a finite level of variance, the mean of all sampled variables from the same population will be approximately equal to the mean of the whole population. Furthermore, these samples will approximate a normal distribution, with their variances being approximately equal to the variance of the population as the sample size gets larger, according to the law of large numbers.

As a general rule, sample sizes of 30 or more are typically deemed sufficient for the CLT to hold, meaning that the distribution of the sample means is fairly normally distributed. In addition, the more samples one takes, the more the graphed results should take the shape of a normal distribution.

The central limit theorem is often used in conjunction with the law of large numbers, which states that the average of the sample means will come closer to equaling the population mean as the sample size grows. This concept can be extremely useful in accurately predicting the characteristics of very large populations.

In general, as the sample size from the population increases, its mean gathers more closely around the population mean with a decrease in variance. Thus, as the sample size approaches infinity, the sample means approximate the normal distribution with a mean, µ, and a variance, 𝜎2/ 𝑛 . As shown above, the skewed distribution of the population does not affect the distribution of the sample means as the sample size increases. Therefore, the central limit theorem indicates that if the sample size is sufficiently large, the means of samples obtained using a random sampling with replacement are distributed normally with the mean, µ, and the variance, 𝜎2/𝑛 , regardless of the population distribution.

---

What Is a Normal Distribution?

Normal distribution, also known as the Gaussian distribution, is a probability distribution that is symmetric about the mean, showing that data near the mean are more frequent in occurrence than data far from the mean. The normal distribution appears as a "bell curve" when graphed.

In a normal distribution, mean (average), median (midpoint), and mode (most frequent observation) are equal. These values represent the peak or highest point. The distribution then falls symmetrically around the mean, the width of which is defined by the standard deviation.

The normal distribution is one type of symmetrical distribution. Symmetrical distributions occur when a dividing line produces two mirror images. Not all symmetrical distributions are normal since some data could appear as two humps or a series of hills in addition to the bell curve that indicates a normal distribution. (While the normal distribution is symmetrical, not all symmetrical distributions are normal. For example, the Student’s t, Cauchy, and logistic distributions are symmetric.)
For all normal distributions, 68.2% of the observations will appear within plus or minus one standard deviation of the mean; 95.4% will fall within +/- two standard deviations; and 99.7% within +/- three standard deviations.

This fact is sometimes called the "empirical rule," a heuristic that describes where most of the data in a normal distribution will appear. Data falling outside three standard deviations ("3-sigma") would signify rare occurrences.

Asymptotic Nature: The tails of the normal distribution curve approach, but never touch, the horizontal axis. This implies that all possible values of the variable, no matter how extreme, have a non-zero probability of occurring.

Skewness measures the degree of symmetry of a distribution. The normal distribution is symmetric and has a skewness of zero. If the distribution of a data set instead has a skewness less than zero, or negative skewness (left-skewness), then the left tail of the distribution is longer than the right tail; positive skewness (right-skewness) implies that the right tail of the distribution is longer than the left.

Kurtosis - Kurtosis measures the thickness of the tail ends of a distribution to the tails of a distribution. The normal distribution has a kurtosis equal to 3.0. Distributions with larger kurtosis greater than 3.0 exhibit tail data exceeding the tails of the normal distribution (e.g., five or more standard deviations from the mean).
This excess kurtosis is known in statistics as leptokurtic, but is more colloquially known as "fat tails." The occurrence of fat tails in financial markets describes what is known as tail risk. Distributions with low kurtosis less than 3.0 (platykurtic) exhibit tails that are generally less extreme ("skinnier") than the tails of the normal distribution.

Although normal distribution is a statistical concept, its applications in finance can be limited because financial phenomena—such as expected stock-market returns—do not fall neatly within a normal distribution. Prices tend to follow more of a log-normal distribution, right-skewed and with fatter tails.

Nature and Normal Distributions

The normal distribution is technically known as the Gaussian distribution, however, it took on the terminology "normal" following scientific publications in the 19th century showing that many natural phenomena appeared to "deviate normally" from the mean. This idea of "normal variability" was made popular as the "normal curve" by the naturalist Sir Francis Galton in his 1889 work, Natural Inheritance.
Many naturally occurring phenomena appear to be normally distributed. For example, the average height of a human is roughly 175 cm (5' 9"), counting both males and females.

Also for example, if we randomly sampled 100 individuals, we would expect to see a normal distribution frequency curve for many continuous variables, such as IQ, height, weight, and blood pressure.

As with any probability distribution, the normal distribution describes how the values of a variable are distributed. It is the most important probability distribution in statistics because it accurately describes the distribution of values for many natural phenomena. Characteristics that are the sum of many independent processes frequently follow normal distributions.

Another way of looking at it from reddit: Things in nature don't "conform" to the normal distribution. The normal distribution "emerges" from the addition of a lot of small constituent events when you observe their sum. From the CLT. It happens precisely because things are random. If they are numerous enough, independent enough, and sufficiently closely distributed, you'll observe that their sum looks like coming from a normal distribution, or close to one.

---

Today, the normal distribution is ubiquitous in statistical practice for several reasons:
It serves as a reference distribution against which other distributions are compared.
The mathematical tractability makes it ideal for building more complex statistical models.
It underpins many machine learning algorithms and artificial intelligence systems.
It provides a foundation for robust statistical inference even when data slightly violates normality assumptions.
The extraordinary staying power of the normal distribution over centuries testifies to its fundamental nature in describing variation across countless domains.

The enduring legacy of the normal distribution isn’t that it applies perfectly to all situations—it doesn’t—but rather that it provides an elegant, mathematically tractable baseline from which to understand variation. As we continue to collect and analyze ever-larger and more complex datasets, the humble bell curve will undoubtedly remain a crucial reference point in our statistical toolkit.

And perhaps for a later day: (sort of setting it up as a future challenge to make sense of some of the following!)

Edwin Thompson Jaynes put it very beautifully that the max entropy distribution is “uniquely determined as the one which is maximally noncommittal with regard to missing information, in that it agrees with what is known, but expresses maximum uncertainty with respect to all other matters”. Therefore, this is the most principled choice. Here is a list of probability distributions and their corresponding maximum entropy constraints, taken from Wikipedia.

Alright, but what normal distribution has anything to do with these? It turns out that normal distribution is the distribution that maximizes information entropy under the constraint of fixed mean m and standard deviation s2 of a random variable X . So, if we know the mean and standard deviation of some data, out of all possible probability distributions, Gaussian is the one that maximizes information entropy, or, equivalently, it is the one that satisfies the least of our assumptions/biases. This principle may be viewed as expressing epistemic modesty or maximal ignorance because it makes the least strong claim on a distribution.

Above last bit from here.

Repurposed - April project blog

Thursday, April 17, 2025

33. Normal Distribution

No comments:

Post a Comment