DP IB Maths: AI HL

Revision Notes

4.2.2 Correlation Coefficients

Test Yourself

PMCC

What is Pearson’s product-moment correlation coefficient?

  • Pearson’s product-moment correlation coefficient (PMCC) is a way of giving a numerical value to a linear relationship of bivariate data
  • The PMCC of a sample is denoted by the letter r
    • r can take any value such that negative 1 less or equal than r less or equal than 1
    • A positive value of r describes positive correlation
    • A negative value of r describes negative correlation
    • r = 0 means there is no linear correlation
    • r = 1 means perfect positive linear correlation
    • r = -1 means perfect negative linear correlation
    • The closer to 1 or -1 the stronger the correlation

2-5-1-pmcc-diagram-1

How do I calculate Pearson’s product-moment correlation coefficient (PMCC)?

  • You will be expected to use the statistics mode on your GDC to calculate the PMCC
  • The formula can be useful to deepen your understanding

begin mathsize 22px style r equals fraction numerator S subscript x y end subscript over denominator S subscript x S subscript y end fraction end style 

      • S subscript x y end subscript equals sum from i equals 1 to n of x subscript i y subscript i minus 1 over n stretchy left parenthesis sum from i equals 1 to n of x subscript i stretchy right parenthesis stretchy left parenthesis sum from i equals 1 to n of y subscript i stretchy right parenthesis is linked to the covariance
      • S subscript x equals square root of sum from i equals 1 to n of x subscript i squared minus 1 over n stretchy left parenthesis sum from i equals 1 to n of x subscript i stretchy right parenthesis squared end root and S subscript y equals square root of sum from i equals 1 to n of y subscript i squared minus 1 over n stretchy left parenthesis sum from i equals 1 to n of y subscript i stretchy right parenthesis squared end root are linked to the variances
    • You do not need to learn this as using your GDC will be expected

When does the PMCC suggest there is a linear relationship?

  • Critical values of r indicate when the PMCC would suggest there is a linear relationship
    • In your exam you will be given critical values where appropriate
    • Critical values will depend on the size of the sample
  • If the absolute value of the PMCC is bigger than the critical value then this suggests a linear model is appropriate

Spearman’s Rank

What is Spearman’s rank correlation coefficient?

  • Spearman's rank correlation coefficient is a measure of how well the relationship between two variables can be described using a monotonic function
    • Monotonic means the points are either always increasing or always decreasing
    • This can be used as a way to measure correlation in linear models
    • Though Spearman's Rank correlation coefficient can also be used to assess a non-linear relationship
  • Each data is ranked, from biggest to smallest or from smallest to biggest
    • For n data values, they are ranked from 1 to n
    • It doesn't matter whether variables are ranked from biggest to smallest or smallest to biggest, but they must be ranked in the same order for both variables
  • Spearman’s rank of a sample is denoted by r subscript s
    • rs can take any value such that negative 1 less or equal than r subscript s less or equal than 1
    • A positive value of rs describes a degree of agreement between the rankings
    • A negative value of rs describes a degree of disagreement between the rankings
    • rs = 0 means the data shows no monotonic behaviour
    • rs = 1 means the rankings are in complete agreement: the data is strictly increasing
      • An increase in one variable means an increase in the other
    • rs = -1 means the rankings are in complete disagreement: the data is strictly decreasing
      • An increase in one variable means a decrease in the other
    • The closer to 1 or -1 the stronger the correlation of the rankings

4-2-2-ib-ai-sl-spearman-rank-diagram-1

How do I calculate Spearman’s rank correlation coefficient (PMCC)?

  • Rank each set of data independently
    • 1 to n for the x-values
    • 1 to n for the y-values
  • If some values are equal then give each the average of the ranks they would occupy
    • For example: if the 3rd, 4th and 5th highest values are equal then give each the ranking of 4
      • fraction numerator 3 plus 4 plus 5 over denominator 3 end fraction equals 4
  • Calculate the PMCC of the rankings using your GDC
    • This value is Spearman's rank correlation coefficient

Appropriateness & Limitations

Which correlation coefficient should I use?

  • Pearson’s PMCC tests for a linear relationship between two variables
    • It will not tell you if the variables have a non-linear relationship
      • Such as exponential growth
    • Use this if you are interested in a linear relationship
  • Spearman’s rank tests for a monotonic relationship (always increasing or always decreasing) between two variables
    • It will not tell you what function can be used to model the relationship
      • Both linear relationships and exponential relationships can be monotonic
    • Use this if you think there is a non-linear monotonic relationship

How are Pearson’s and Spearman’s correlation coefficients connected?

  • If there is linear correlation then the relationship is also monotonic
    • r equals 1 rightwards double arrow r subscript s equals 1
    • r equals negative 1 rightwards double arrow r subscript s equals negative 1
    • However the converse is not true
  • It is possible for Spearman’s rank to be 1 (or -1) but for the PMCC to be different
    • For example: data that follows an exponential growth model
      • r subscript s equals 1 as the points are always increasing
      • r less than 1 as the points do not lie on a straight line

Are Pearson’s and Spearman’s correlation coefficients affected by outliers?

  • Pearson’s PMCC is affected by outliers
    • as it uses the numerical value of each data point
  • Spearman’s rank is not usually affected by outliers
    • as it only uses the ranks of each data point

Exam Tip

  • You can use your GDC to plot the scatter diagram to help you visualise the data

Worked example

The table below shows the scores of eight students for a maths test and an English test.

Maths left parenthesis x right parenthesis

7

18

37

52

61

68

75

82

English left parenthesis y right parenthesis

5

3

9

12

17

41

49

97

a)
Write down the value of Pearson’s product-moment correlation coefficient, r.

4-2-2-ib-ai-sl-correlation-coefficients-a-we-solution

b)
Find the value of Spearman’s rank correlation coefficient, r subscript s.

4-2-2-ib-ai-sl-correlation-coefficients-b-we-solution

c)
Comment on the values of the two correlation coefficients.

4-2-2-ib-ai-sl-new-we-c

You've read 0 of your 0 free revision notes

Get unlimited access

to absolutely everything:

  • Downloadable PDFs
  • Unlimited Revision Notes
  • Topic Questions
  • Past Papers
  • Model Answers
  • Videos (Maths and Science)

Join the 100,000+ Students that ❤️ Save My Exams

the (exam) results speak for themselves:

Did this page help you?

Dan

Author: Dan

Dan graduated from the University of Oxford with a First class degree in mathematics. As well as teaching maths for over 8 years, Dan has marked a range of exams for Edexcel, tutored students and taught A Level Accounting. Dan has a keen interest in statistics and probability and their real-life applications.