Chapter 9 : Correlation and Regression
Topics covered in this snacksized chapter:
A correlation is a single number that describes the degree of relationship between two variables.
It is often measured with Pearson’s Correlation Coefficient and it is represented with the letter r.
Where, n equals the number of values.
In this formula:
 Multiply by and sum

 Sum all the scores

 Sum all the scores

 Square all and sum

 Square all and sum

 Square the sum of

 Square the sum of

The different types of correlation are:
When the values of two variables x and y move in the same direction, the correlation is said to be positive.
That is in positive correlation, when there is an increase in x, there will be an increase in y also. Similarly when there is a decrease in x, there will be a decrease in y also.
When the values of two variables x and y move in opposite direction, we say correlation is negative.
That is in negative correlation, when there is an increase in x, there will be a decrease in y. Similarly when there is a decrease in x, there will be an increase in y.
When the change in one variable results in the constant change in the other variable, we say the correlation is linear.
When there is a linear correlation, the points plotted will be in a straight line.
When the amount of change in one variable is not in a constant ratio to the change in the other variable, we say that the correlation is nonlinear.
If there are only two variable under study, the correlation is said to be simple.
When one variable is related to a number of other variables, the correlation is not simple. It is multiple if there is one variable on one side and a set of variables on the other side.
Example:
Relationship between yield with both rainfall and fertilizer together is multiple correlations.
The correlation is partial if we study the relationship between two variables keeping all other variables constant.
Example:
The relationship between yield and rainfall at a constant temperature is partial correlation.
A statistical measure that attempts to determine the strength of the relationship between one dependent variable (usually denoted by Y) and a series of other changing variables (known as independent variables).
It analyzes the relationship between two variables: x and y.
 y = a + bx
It takes the input signal and fits an exponential function to it
where, t is the variable along the xaxis.
A quadratic function is a function f (x) of the form for ﬁxed constants a, b, and c.
Simple Linear Regression of one dependent variable (Y) and one independent variable (X).
The model is:
Where,
y= Values of the dependent variable
x_{
}= Values of the independent variable
a, b= “Regression coefficients” (what we want to find)
_{
}= Residual or error
The Coefficient of Determination is the proportion of the total variation in the dependent variable that is explained or accounted for by the variation in the independent variable
 It is the square of the coefficient of correlation.
 It does not give any information about the direction of the relationship between the variables.
A line of best fit is a straight line that best represents the data on a scatter plot.
We measure the fit with the coefficient of determination, r^{2
}.
r^{2
} is the proportion of variation in Y explained by the regression.
 0 indicates no relationship, 1 indicates perfect relationship.
Find individual residuals or errors
Then, the sum of all the residual is
Observed value of the dependent variable for the i^{th
}observation.
Estimated value of the dependent variable for the i^{th
}observation.
The slope of the regression line is calculated by this formula:
Where,
x = Value of independent variables
y = Value of dependent variables
Y
Intercept for the Estimated Regression Equation arrow_upward
Where,
= Mean value for independent variable
= Mean value for dependent variable
Most of the points lie on the line: