Basics of Sampling Distribution




You might have heard about the term Census. Census is the method of collecting, compiling, organizing, analyzing, and publishing the demographic information of a certain population (more specifically, of a country). In short, the census is the method of studying a large population.

In our day-to-day life, it is quite laborious, time and cost consuming to gather a large scale of information to study the population. Therefore, we often collect samples for analysis and make conclusions about the whole population based on the sample.

 

Sampling terminologies

For example, we often taste a bite of fruit, check whether it is sweet or not and decide whether to buy the fruits or not. The mother in the kitchen tastes the two-three grains of rice and decides whether the rice has been cooked or not, etc.

In all these scenarios, we take samples from the population for analysis.

Before going to study sampling distribution let us understand some definitions in the context of statistics.

  • Population: Population is the set of all the observations under study. The parameter is a characteristic of the population.
  • Sample: The sample is a subset or a part of the population. A statistic is a characteristic of the sample.

Let us see the sampling distributions of the sample mean (M) and sample proportion as follows.


Sampling Distribution of Sample Mean

Suppose you draw a random sample of a sufficiently large sample size (n) from a population whose mean is µ and the standard deviation is σ. Then the sampling distribution of sample mean (M) follows a normal distribution. This has a mean µM = µ and standard deviation (σM) as follows.

 

Sampling Distribution of sample proportion

Suppose you take a random sample of size n from a population with proportion p such that, n*p ≥ 10 and n*(1-p) ≥ 10. Then the sampling distribution of sample proportion has a normal distribution with mean p and standard deviation as follows.



To understand how the does sampling distribution formulated let us see the step-by-step procedure along with a formal definition.


Definition of a sampling distribution

Suppose a random sample of size (n) is taken from a population. Let T be a statistic based on this sample. Then the sampling distribution is the distribution of the sample statistics of all possible samples of the same size (n) taken from the same population.


Illustration

Consider an example, suppose a researcher takes a sample of size 5 from the population of size 10 and he is interested in studying the properties of the sample mean (M). There are 252 (=10C5) ways to select 5 observations among a total of 10 observations. Hence, 252 samples of size 5 are possible for the given population of size 10.

We can find the sampling distribution of sample mean (M) using the following steps:

  1. Select a random sample of size 5 from the population.
  2. Then calculate the sample mean (M) for the selected sample.
  3. Repeat steps 1 and 2 for 252 distinct samples to get 252 distinct sample means.
  4. Then formulate the distribution of sample means (M) based on these 252 values.

The distribution that we get here is the sampling distribution of sample mean (M).

 

Use of the sampling distribution in Central Limit Theorem

Let X1, X2, …, Xn be the random sample of size n drawn from the population with mean µ and standard deviation σ. Let M be a sample mean, a function of (X1, X2, …, Xn), defined as follows.



As the sample size (n) increases, the sampling distribution of the sample mean converges to the normal distribution that has mean µ and the standard deviation is given by the formula, 



The central limit theorem assures us to use a normal distribution that resolves many problems in the field of statistical inference.