Correlation Measurements With Microsoft Excel

May 15
07:31

2008

Stephen L Nelson

Stephen L Nelson

  • Share this article on Facebook
  • Share this article on Twitter
  • Share this article on Linkedin

Microsoft Excel supplies several easy-to-use statistical functions for just such a purpose says bestselling computer book author Stephen L. Nelson

mediaimage

Excel provides useful statistical functions for measuring correlation between two variables.

As a reminder,Correlation Measurements With Microsoft Excel Articles the benefit of using a correlation coefficient to measure the relationship between two variables as opposed to using covariance is that the unit of measurement doesn’t matter.

But a caution: Remember that correlation does not show causation. That is, you could easily show that as the number of ice cream cones consumed increases during a year, so does the number of drownings. But this does not mean that eating ice cream causes people to drown—more likely, these variables are both independently related to another variable—that of temperatures. Correlation is symmetrical, so you get the same coefficient if you switch the variables. Don’t calculate a correlation coefficient if you manipulated one of the variables. Use linear regression instead.

CORREL

You use the CORREL function in Excel to determine whether two data sets are related, and if so, how strongly. The correlation coefficient ranges from +1, indicating a perfect positive linear relationship, to –1, indicating a perfectly negative linear relationship. To calculate a correlation coefficient for a sample, Excel uses the covariance of the samples and the standard deviations of each sample. To use the CORREL function in Excel, just select the two sets of data to use as the arguments and use the following syntax:

=CORREL(data set 1,data set 2)

For example, if you have a set of preliminary test scores for a sample of employees in column A and a set of performance feedback scores in column B, as shown in Figure 4-6, and you want to find out whether they’re related and if so, how strongly, you can use Excel to find the correlation coefficient for the samples.

The function returns the value 0.87, indicating that the sets are positively related (as the value of one goes up, the value of the other also increases), but the relationship isn’t perfect.

PEARSON

The Pearson product moment correlation coefficient function, PEARSON, uses a different equation for calculating the correlation coefficient. This formula doesn’t require the computation of each deviation from the mean. Still, the correlation coefficient ranges from +1, indicating a perfect positive linear relationship, to –1, indicating a perfectly negative linear relationship. The PEARSON function uses the following syntax:

=PEARSON(data set 1,data set 2)

Using the PEARSON function on the data shown in Figure 4-6 to compute the correlation coefficient returns the same value as the CORREL function does.

RSQ

The RSQ function calculates the square of the Pearson product moment correlation coefficient through data points in the data sets. You can interpret the r-squared value as the proportion of the variance in y attributable to the variance in x. The RSQ function uses the following syntax:

=RSQ(data set 1,data set 2)