Chapter 4 : Variance and Standard Deviation
Topics covered in this snacksized chapter:
The goal for variability is to obtain a measure of how the scores spread out in a distribution.
A measure of variability accompanies a measure of central tendency as basic descriptive statistics for a set of scores.
Variability can be measured with:
A measure of variability that describes an average distance of every score from the mean is known as standard deviation.
The square root of the variance is also standard deviation.
It shows variation about the Mean.
Population Standard Deviation:
 If x_{1
}, x_{2
}, …….. x_{N
} denotes all N values from a population, then the population standard deviation is given by:
Where,
is the mean of population.
Sample Standard Deviation (S):
 If x_{1
}, x_{2
}, …….. x_{N
} denotes all N values from a population, then the sample standard deviation is given by:
Where,
is the mean of sample.
Steps to Calculate Standard Deviation of a Sample arrow_upward
Calculate the mean of the sample.
Find the difference between each entry (x) and the mean. Square the deviations from the mean.
Sum the squares of the deviations from the mean.
Divide the sum by (n – 1) to get the variance.
Take the square root of the variance to get the standard deviation.
Example:
Find the Standard deviation of the data set given below:
1, 2, 3, 4, 5
Solution:
Step 1:
X




1
 3
 2
 4

2
 3
 1
 1

3
 3
 0
 0

4
 3
 1
 1

5
 3
 2
 4

Step 2:
Find the sum of
4 + 1 + 0 + 1 + 4 = 10
Step 3:
Find n – 1.
n = 5
n – 1 = 5 – 1 = 4
Step 4:
Now we get the Standard Deviation using the formula:
The square of the standard deviation, it is a description of how much each score varies from the mean.
Example:
Determine the standard deviation and variance from following data:
30, 26 and 22
Here, n = 3.
Solution:



30
 4
 16

26
 0
 0

22
 4
 16

Mean = 26
 0
 32

The Variance:
= 32 ÷ 2 = 16
The Standard Deviation:
Greater means more dispersion of data.
Below are the two data sets with same mean but different standard deviations.
It is the measure of relative variation.
It shows variation relative to the mean.
It is used to compare two or more sets of data measured in different units.
Where,
= Sample Standard Deviation.
= Sample Mean.
Z – Score is the difference between the value and the mean, divided by the standard deviation.
 The value of the zscore tells exactly where a raw score is located relative to all the other scores in the distribution.
 A score that is located two standard deviations above the mean will have a zscore of +2.00.
The zscore is often called the Standardized Value.
Useful in identifying outliers (extreme values).
 Outliers are values in a data set that are located far from the mean.
 The larger the Z – Score, the larger the distance from the mean.
 A Z – Score is considered an outlier if it is
Or
Z Score is negative if a data value is less than the sample mean.
Z Score is positive if a data value is greater than the sample mean.
Z Score is zero if the data value is equal to the sample mean.
Chebyshev's Theorem enables us to state that a proportion of data values must be within a specified number of standard deviation of the mean.
This theorem applies to any data set regardless of the shape of the distribution of the data.
At least of the data values must be within z standard deviation of the mean where z is any value greater than 1.
For z = 2, 3 and 4, the theorem states that:
 At least 75% of the observations must be contained within the distances of 2 Standard Deviation around the mean.
 At least 88.89% of the observations must be contained within the distances of 3 Standard Deviation around the mean.
 At least 93.75% of the within the distances of 4 Standard Deviation around the mean.
“Bell curve” refers to the shape that is created when a line is plotted using the data points for an item that meets the criteria of “normal distribution”.
For most data sets:
 Approximately 68% of the observations fall within Standard Deviation (SD) around the mean.
 Approximately 95% of the observations fall within Standard Deviation SD around the mean.
 Approximately 99.7% of the observations fall within SD around the mean.
Applications:
Normal distributions are used in statistics to make inferences about the population mean when the sample size n is large.