AQA A Level Maths: Statistics

Topic Questions

2.4 Correlation & Regression

1a
Sme Calculator
4 marks

For each of the following four scatter graphs, identify the type and strength of any linear correlation shown.q1-easy-2-4-correlation-and-regression-edexcel-a-level-maths-statistics

1b
Sme Calculator
1 mark

Sketch a scatter graph to show a perfect negative linear correlation between two variables.

Did this page help you?

2
Sme Calculator
5 marks

A teacher is interested in the relationship between the number of hours her students spend on a phone per day and the number of hours they spend on a computer. She takes a sample of nine students and records the results in the table below.

Hours spent on a phone per day 7.6 7 8.9 3 3 7.5 2.1 1.3 5.8
Hours spent on a computer per day 1.7 1.1 0.7 5.8 5.2 1.7 6.9 7.1 3.3

 

(i)
Plot a scatter diagram of this data on the axes below.

(ii)
Describe the linear correlation shown in your diagram.

(iii)
Interpret the correlation in the context of the question.q2-easy-2-4-correlation-and-regression-edexcel-a-level-maths-statistics

Did this page help you?

3a
Sme Calculator
4 marks

The table below shows data for a sample of 8 people comparing the maximum number of pull-ups they are able to complete, x, with the maximum number of press-ups, y.

Number of pull-ups (x) 5 10 8 3 6 8 1 4
Number of press-ups (y) 24 34 36 18 30 35 11 19

 

(i)
Plot a scatter diagram on the axes below.

(ii)
Describe the type of correlation shown in your scatter diagram.q3-easy-2-4-correlation-and-regression-edexcel-a-level-maths-statistics

3b
Sme Calculator
4 marks

The equation of the regression line of y on x is y=3x + 9.

(i)
Add this regression line to your scatter diagram. 

(ii)
Explain the purpose of regression lines and how they may be used.

Did this page help you?

4a
Sme Calculator
1 mark

A class is asked to collect a sample of bivariate data. They collect data on the shoe size, S, and the arm span, A cm, of 20 randomly selected boys from the class. 

Explain what is meant by the term ‘bivariate data’.

4b
Sme Calculator
3 marks

The class plot the data in a scatter diagram and find the equation of the regression line of A on S to be A=4.5 S + 133. These are both plotted in the diagram below.q4-easy-2-4-correlation-and-regression-edexcel-a-level-maths-statistics

(i)
Interpret the value 4.5 in the context of the question.

(ii)
Interpret the value 133 in the context of the question.

(iii)
Explain how the sign of the coefficient of S in the equation is related to the correlation shown in the scatter diagram.

Did this page help you?

5a
Sme Calculator
1 mark

The following table shows data comparing the length of time a cake was baked for, t minutes, with the mass of the cake once it has cooled, m grams. Each cake in the sample weighed the same before being baked.

t 37 35 36 31 30 28 36
m 825 868 812 943 947 997 837


State which variable is the explanatory (independent) variable and which is the response (dependent) variable.

5b
Sme Calculator
2 marks

The equation for the regression line of m on t is m=1531minus19t.

(i)
Use the regression line to estimate the mass of a cake if it is baked for 32 minutes.

(ii)
Comment on the validity of your estimate in part (b)(i).
5c
Sme Calculator
2 marks
(i)
Use the regression line to estimate the mass of a cake if it is baked for 80 minutes.

(ii)
Comment on the validity of your estimate in part (c)(i).

Did this page help you?

6a
Sme Calculator
3 marks

Isla is investigating whether the number of deep-fried chocolate bars a person eats has an impact on his or her level of fitness. She takes a sample of 10 people and records how many deep-fried chocolate bars they eat during a month, c, and then times how long it takes them to complete a 100-metre sprint, t seconds, at the end of the month.

She plotted the data in a scatter diagram and found the equation of the regression line of t on c to be t = 5c+12.

Find an estimate for the 100-metre sprint time for a person if they eat:

(i)
2 deep-fried chocolate bars in a month,

(ii)
54 deep-fried chocolate bars in a year.

6b
Sme Calculator
2 marks

Describe the type of linear correlation you would expect to see on Isla’s scatter diagram and state which value in the regression equation tells you this. 

Did this page help you?

7a
Sme Calculator
2 marks

Terrence has collected data comparing how many adverts, A, he sees whilst watching TV for different lengths of time, t hours. With this data, Terrence plotted the scatter diagram shown below.q7-easy-2-4-correlation-and-regression-edexcel-a-level-maths-statistics

(i)
Describe the linear correlation shown in this scatter diagram.

(ii)
What does the correlation suggest about the relationship between the number of adverts Terrence sees and the length of time he watches TV?
7b
Sme Calculator
3 marks

State, with a reason, whether each of the following equations would be appropriate for the equation of the regression line of A on t:

(i)
A=18t+5,

(ii)
t=18A+5,

(iii)
A=-18t+5.

Did this page help you?

8a
Sme Calculator
3 marks

Two liquids are mixed and heated to a particular temperature.  The time, in seconds, it takes the two liquids to react is recorded.  The scatter diagram below shows the results.q8-easy-2-4-correlation-and-regression-edexcel-a-level-maths-statistics

(i)
Identify the two outliers shown on the scatter diagram.

(ii)
Clean the data by removing these outliers and find the mean reaction time.
8b
Sme Calculator
2 marks
(i)
Describe the correlation shown by the scatter diagram.

(ii)
A student says that if the mixture is heated to 60 °C the two liquids will react almost instantly.  Explain why the student may be incorrect.

Did this page help you?

1a
Sme Calculator
1 mark

Ella measures how the extension, x mm, of a thin piece of metal wire varies with the force applied to it, F kN. She records her results in the table below.

F 15 32 49 76 99 106 112 124 132
x 0.2 0.4 0.6 0.9 1.4 1.5 1.6 1.8 1.8


Ella calculates the regression line of
F on x to be F = 0.004 minus 69.3 x

Explain why this equation must be wrong.

1b
Sme Calculator
1 mark

The correct equation for the regression line of F on x is F = 6.16 + 67.6x.

Interpret the value of 67.6 in this context.

1c
Sme Calculator
2 marks

Using the correct regression line, Ella estimates that if she applies a force of 1000 kN then the wire will show an extension of 14.7 mm. 

Give two reasons why Ella’s estimate may not be accurate.

Did this page help you?

2a
Sme Calculator
4 marks

The table below shows a comparison of the average house price, H (£100 000), and the average yearly income, I (£10 000), for different areas around the UK in 2021.

Area H I
Conwy 155.1 26.4
Perth and Kinross 181.3 27.9
Richmondshire 190.3 25.1
Monmouthshire 232.6 31.4
Trafford 260.2 32.0
Gwynedd 148.5 23.6
Basingstoke and Dean 297.7 33.7
Daventry 259.2 29.5

(i)

Plot a scatter diagram of
I against H, and

(ii)
describe the correlation shown.
2b
Sme Calculator
2 marks

The equation of the regression line of I on H is calculated to be space I space equals space 0.06 H space plus space 15.92.
A particularly unscrupulous politician uses this to claim that if you want a salary of £ 35 space 000, all you need to do is buy a house that costs £ 583 space 000.

Comment on the validity of the politician's claim.

Did this page help you?

3a
Sme Calculator
2 marks

Two researchers, Alwyn and Beth, are working on a project collecting data about the self-reported happiness of students on a scale from 0 to 10, H, and the number of exams sat by those students, n. After collecting data from 1000 students, they construct a scatter diagram and find the equation of the regression line of H on n to be H space equals space 7.63 minus 0.82 n.

Explain what correlation the data is likely to show in the scatter diagram.

3b
Sme Calculator
1 mark

What information about the original data set would need to be checked before using the regression line equation to estimate the self-reported happiness of a student sitting 8 exams?

3c
Sme Calculator
2 marks

After calculating the equation of the line of regression, Alwyn accidentally deletes all the data collected about the self-reported happiness scores. Alwyn says it's not a problem since he can use the regression line and the number of exams sat to recalculate all the values. Beth says that Alwyn is wrong and the original data is lost forever.

Explain which researcher is correct.

Did this page help you?

4a
Sme Calculator
1 mark

A consultant is trying to improve the efficiency of how a factory making chewing gum operates.  To help them do this, they collect many types of data about the factory workers.  One such type of data is the number of chewing gum packets made per shift.  The list below shows the number of chewing gum packets made by a particular worker (Worker 1) during the last 10 shifts worked.

392     414     536     474     212     396     427     545     459      234

Calculate the mean number of chewing gum packets made per shift by Worker 1 to the nearest whole number of packets.

4b
Sme Calculator
5 marks

The table below shows the mean number of chewing gum packets, N, made by various workers along with how many hours of training, T hours, they have received.

Worker

1

2

3

4

5

6

7

8

9

 bold italic N

 

512

499

359

393

432

456

520

475

 bold italic T

18

24

22.5

15

16

20

21

22

21

(i)
Including your answer from (a), plot a scatter diagram of the data in the table above.
(ii)
Given that the equation of the regression line of N on T is N equals 18 T plus 95,  add the regression line to your scatter diagram.
4c
Sme Calculator
3 marks

The consultant then goes on to collect even more data on other factory workers and records some of it in the table below.

Worker

10

11

12

13

14

15

16

17

18

 bold italic N

600

598

584

602

593

585

591

601

605

 bold italic T

29

28.5

32

29

34.5

30.5

37

31

30

Without adding this new data to your scatter diagram, what advice could the consultant give to the factory to improve the efficiency of their workers?

Did this page help you?

5a
Sme Calculator
4 marks

The table below shows data from the large data set on the engine size, S cm3, and the mass of the vehicle, M kg, for a random sample of 10 cars that were first registered in 2016.

M 925 1225 1141 1350 1425 1280 1613 1505 1820 1816
S 998 1248 1398 1399 1499 1598 1956 1984 1995 1997

(i)
Plot a scatter diagram of S against M, and
(ii)
explain the correlation shown in this context.
5b
Sme Calculator
6 marks

The equation for the regression line of S on M is equals negative 7.78 plus 1.15 M.

The table below shows S and M for a random sample of 3 other cars first registered in 2016.

M 1095 1485 1232
S 1242 1798 999


Considering this second sample, use the regression line equation and the values of M to predict values of S and find the average percentage difference of these estimated values of S from the true values of S. Hence, comment on how accurately this regression line equation can predict values.

5c
Sme Calculator
1 mark

Using your knowledge of the large data set, explain whether there is likely to be a causal relationship between S and M.

5d
Sme Calculator
1 mark

A researcher claims that this correlation between S and M is a coincidence. How could you use data from the large data set to check this claim?

Did this page help you?

1a
Sme Calculator
2 marks

A teacher collected the maths and physics test scores of a number of students and drew a scatter diagram to represent this data.q1-medium-2-4-correlation-and-regression-edexcel-a-level-maths-statistics

Describe the correlation shown by the scatter diagram, and interpret the correlation in context.

1b
Sme Calculator
2 marks

An alternative therapist collected data on his clients’ reported levels of anxiety as well as the number of trees they had hugged in the course of therapy.  He drew a scatter diagram to represent this data.q1b-medium-2-4-correlation-and-regression-edexcel-a-level-maths-statistics

Describe the correlation shown by the scatter diagram, and interpret the correlation in context.

Did this page help you?

2a
Sme Calculator
3 marks

The table below shows data from the United States regarding annual per capita cheese consumption (in pounds) and the divorce rate (number of divorces per 1000 people) for ten years between 2000 and 2018:

Year 2000 2002 2004 2006 2008 2010 2012 2014 2016 2018
Cheese consumption (pounds) 32.1 32.8 33.6 34.8 34.5 35 35.5 36.2 38.5 40
Divorce rate (number per 1000 people) 4 3.9 3.7 3.7 3.5 3.6 3.4 3.2 3.0 2.9


Draw a scatter diagram to represent this data, with per capita cheese consumption on the horizontal axis and divorce rate on the vertical axis.

2b
Sme Calculator
2 marks
(i)
Describe the correlation between per capita cheese consumption and divorce rate.

(ii)
Do you think there is a causal relationship between per capita cheese consumption and divorce rate in the United States?
Explain your reasoning.

Did this page help you?

3a
Sme Calculator
2 marks

Myfanwy has been applying different voltages (v, measured in volts) to an electrical circuit in her lab and recording the resulting currents (i, measured in amps).  The smallest voltage she applied was 0.5 volts, and the largest voltage she applied was 120 volts.

She found the equation of the regression line of i on v to be  i = 0.056+0.332v.  

(i)
Interpret the value 0.332 in this context.

(ii)
Use the equation to predict the current for a voltage of 70 volts.
3b
Sme Calculator
2 marks

Explain why it would not be sensible to use the regression equation to work out:

(i)
the current resulting from a voltage of 2000 volts

(ii)
the voltage corresponding to a current of 20 amps.
3c
Sme Calculator
2 marks

Myfanwy’s lab partner suggests that the value 0.056 in the regression equation represents the current in the circuit when the voltage applied is zero.  Explain why he might suggest this, but also suggest a reason why his interpretation is most likely incorrect.

Did this page help you?

4a
Sme Calculator
2 marks

The following table shows the height, h cm, and weight, w kg, for each of eleven students at a sixth form college.

h 167 182 176 173 17 174 177 178 172 170 169
w 51 62 69 65 65 56 64 62 51 55 58


The following statistics were calculated for the data on height:

mean=159.5 cm,   standard deviation=45.3 cm

An outlier is an observation which lies more than ±2 standard deviations from the mean.

(i)
Show that h=17 is an outlier.

(ii)
Explain why this outlier should be omitted from the data.
4b
Sme Calculator
5 marks

With the outlier data excluded, the equation of the regression line of w on h is  w = minus87.6 + 0.845h.

(i)
Exclude the outlier data from the recorded measurements and draw a scatter diagram to represent the data for the remaining ten students.

(ii)
Draw the regression line on your diagram.
4c
Sme Calculator
2 marks

Based on your diagram, along with the regression equation, to what extent would you say that a person’s height may be used as an accurate predictor of his or her weight?

Did this page help you?

5a
Sme Calculator
1 mark

The table below shows the mass, m(kg), and the CO2 emissions, c (g/km), for a sample of 12 Ford cars registered in 2016, from the large data set.

bold italic m 1833 1144 1327 1399 989 1555 1806 1497 1730 2030 1211 1088
bold italic c 134 138 119 98 115 119 129 159 225 152 138 122

The equation of the regression line of c on m is c space equals space 79.5 space plus space 0.0394 m.

Give an interpretation of the value of the gradient of the regression line.

5b
Sme Calculator
2 marks

Use your knowledge of the large data set to explain whether there is likely to be a causal relationship between the mass of a car and its CO2 emissions.

5c
Sme Calculator
2 marks

Explain why it would not be reliable to use this regression equation to predict:

(i)
the CO2 emissions for a car with a mass of 2500 kg
(ii)
the mass of a car with CO2 emissions of 170 g/km.
5d
Sme Calculator
3 marks

The median and quartiles for the emissions data are:

straight Q subscript 1 space equals space 119 space space space space space space space space space space straight Q subscript 2 space equals space 131.5 space space space space space space space space space space straight Q subscript 3 equals space 145

An outlier is defined as a value which lies either 1.5 cross times the interquartile range above the upper quartile or 1.5 cross times the interquartile range below the lower quartile.

(i)
Show that c = 225 is an outlier.
(ii)
Give a reason why you might include, and a reason why you might exclude, data from the car for which c = 225.
5e
Sme Calculator
2 marks

Using your knowledge of the large data set, suggest two other factors about cars that should also be considered if creating a model to predict a car's CO2 emissions.

Did this page help you?

1
Sme Calculator
5 marks

Four statisticians are arguing over which line best highlights the trend of the set of data shown in the scatter diagram below.

q1a-1-3-very-hard-ial-sl-maths-statistics

The first statistician draws, by eye, a line of best fit and claims its equation is y equals negative 0.05 plus 0.17 x. The second draws, again by eye, a different line of best fit and claims its equation is y equals negative 1.08 plus 1.3 x. The third calculates the equation of the regression line of y on x claims it is y equals 0.18 plus 0.11 x.  The fourth statistician claims that all three of the other statisticians are definitely wrong and that there is no line of best fit.

By adding each of these lines to the scatter diagram, comment on the claims of each of the statisticians.

Did this page help you?

2a
Sme Calculator
4 marks

Paige takes a sample of 9 cities throughout the UK to compare the percentage of people living in a city who identify as vegan, V %, and the percentage of restaurants offering vegan options in that same city, R%.

The regression line of R on V is calculated, and it is used to predict values of R for V space equals space 1.35 and V equals 1.03, the values returned are R equals 70.73 and R equals 50.314 respectively.

Find the equation of the regression line of R on V.

2b
Sme Calculator
2 marks

In one of the cities, 1.16% of people were vegan and 55.9% of restaurants offered vegan options.

Use the equation of the regression line of R on V to estimate the percentage of restaurants offering vegan options in a city in which 1.16% of people are vegan. Give your estimated value of R to 3 significant figures. Compare this to the information above.

2c
Sme Calculator
2 marks

Paige discovers that in one city every restaurant offers vegan options. Paige suggests that the equation of the regression line of R on V can be used to find the percentage of people in this city who identify as vegan. Explain why Paige is likely wrong.

Did this page help you?

3a
Sme Calculator
5 marks

A ride sharing app collected data on the time, t minutes, taken to complete a journey of distance, d miles.  Data from a random sample of 8 journeys is detailed in the table below.

d 3.9 6.6 8.5 1.3 1.7 3.7 7.4 6.1
t 25 36 39 6 8 19 38 32


By plotting a scatter diagram of
t on d for this data, explain whether or not it is appropriate to use a linear regression model on this data.

3b
Sme Calculator
1 mark

Using a new random sample of thousands of journeys, the ride sharing app calculated the regression line of time on distance to be t = minus1.8 + 5.9d

The app uses this regression equation to predict that a journey of distance 7 km would take 39.5 minutes.  Explain why this is incorrect.

3c
Sme Calculator
1 mark

The regression equation predicts that for journeys less than 0.3 miles the time taken will be less than zero minutes.  What is the most likely reason that the regression equation gives this false prediction?

Did this page help you?

4a
Sme Calculator
2 marks

A maths teacher randomly selects 10 students from a class of 30 to answer a survey. The survey asks students how many practice questions they completed when revising for a recent test, Q, and their percentage score in that test, S %.  Summary statistics for Q are shown below

Q with bar on top=21                    Range of Q=20 

The equation of the regression line of S on Q is  S = 34 + 2Q

Explain which variable is the response variable.

4b
Sme Calculator
6 marks

Use the regression equation to find an estimate for the mean value and range of S. State any assumptions that are needed.

4c
Sme Calculator
2 marks

Comment on the reliability of using the regression equation to:

(i)
estimate the scores of the other students in the maths class,

(ii)
estimate the scores of this cohort of students in a science class.

Did this page help you?

5a
Sme Calculator
4 marks

An owner of a beach resort is comparing parasol sales, £p, and sun cream sales, £s, at the resort over a period of eleven days. The data is standardised by coding the variables using x = begin mathsize 14px style fraction numerator s minus 153 over denominator 103 end fraction end style and  y = begin mathsize 14px style fraction numerator p minus 32 over denominator 37 end fraction end style. The values for the first ten days are plotted on the scatter diagram below.q5-veryhard-2-4-correlation-and-regression-edexcel-a-level-maths-statistics

(i)
On the eleventh day, the resort sold £246 worth of sun cream and £69 worth of parasols. Use this information to complete the scatter diagram. 

(ii)
The equation for the regression line of y on x is  y = 0.19+0.83x.  Add the regression line to the scatter diagram.
5b
Sme Calculator
5 marks
(i)
Show that by using the regression line of y on x and the coding equations above, the regression line of p on s can be written in the form  p = a + bs where a and b are constants to be found to 3 significant figures.

 

(ii)
Hence, or otherwise, find an estimate for the amount of parasol sales on a day where there are £170 of sun cream sales.

Did this page help you?

6a
Sme Calculator
4 marks

An environmentalist is using the large data set to see if there is a correlation between the CO emissions, c g/km, and the NOX emissions, n g/km, of cars first registered in 2016. To do this, the environmentalist takes 5 different samples, each containing 6 cars, and calculates the mean values of c and n for each sample. The data is shown in the table below.

bold italic c with bar on top 0.20 0.28 0.14 0.52 0.23
bold italic n with bar on top 0.03 0.06 0.01 0.02 0.05

(i)
On the grid below, plot a scatter diagram of n with bar on top on c with bar on top for the data above.
(ii)
Circle the point that does not fit the trend.
(iii)
Ignoring the point that does not fit the trend, the equation of the regression line of n with bar on top on c with bar on top is n with bar on top equals space minus 0.04 space plus space 0.36 c with bar on top.  Add the regression line to your scatter diagram.
q6a-2-4-very-hard-aqa-a-level-maths-statistics
6b
Sme Calculator
3 marks

The environmentalist now wishes to find an estimate for the total NOX emissions produced by a group of 6 cars that that have a mean value of CO emissions of 0.17 g/km after each car has driven 20 km.

Did this page help you?