Lines of Best Fit & Regression Lines (Edexcel GCSE Statistics)

Revision Note

Roger

Author

Roger

Expertise

Maths

Line of Best Fit Basics

What is a line of best fit?

  • If a scatter graph suggests that there is a positive or negative correlation

    • a line of best fit can be drawn on the scatter graph

      • This can then be used to make predictions

How do I draw a line of best fit?

  • line of best fit can often be drawn by eye

    • It is a straight line (use a ruler!)

    • It must extend across the full data set

    • There should be roughly as many points on either side of the line (along its whole length)

    • The spaces between the points and the line should roughly be the same on either side

  • If there is one extreme value (outlier) that does not fit the general pattern

    • then ignore this point when drawing a line of best fit

What is the double mean point?

  • A question may talk about the double mean point

    • This is the point open parentheses x with bar on top comma space y with bar on top close parentheses

      • x with bar on top is the mean of the data values that are plotted along the x-axis

      • y with bar on top is the mean of the data values that are plotted along the y-axis

    • The question may give you the values of x with bar on top and y with bar on top

      • Or you may need to calculate the means from the data

  • If a question mentions the double mean point, then the line of best fit must go through the double mean point

    • It should still follow all the other rules for drawing a line of best fit (roughly same number of points on each side, etc.)

  • If a question doesn't mention the double mean point

    • then you don't need to calculate it or worry about drawing the line through it

How do I use a line of best fit?

  • The line of best fit can be used to predict the value of one variable from the other variable

    • See the Worked Example

  • Predictions should only be made for values that are within the range of the given data

    • Making a prediction within the range of the given data is called interpolation

      • This will normally give a reliable result

    • Making a prediction outside of the range of the given data is called extrapolation 

      • This is much less reliable

What about the gradient and y-intercept of a line of best fit?

  • You need to be able to interpret the meaning of the gradient and y-intercept of a line of best fit

  • The gradient is the slope or 'steepness' of the line

    • A question may tell you the gradient of the line of best fit

    • If you need to find it you can calculate it using 'rise over run'

      • Pick two points on the line with coordinates open parentheses x subscript 1 comma space y subscript 1 close parentheses and open parentheses x subscript 2 comma space y subscript 2 close parentheses

      • gradient equals fraction numerator y subscript 2 minus y subscript 1 over denominator x subscript 2 minus x subscript 1 end fraction

      • Be careful – the plotted data points will usually not be points on the line!

  • The gradient of the line of best fit tells you the rate of change of the y-axis variable with respect to the x-axis variable

    • This needs to be interpreted in context

      • For example if the x-axis variable is distance travelled in a taxi (in miles) and the y-axis variable is the cost of the taxi ride (in pounds £)

      • then the gradient of the line of best fit (£ per mile) is the cost in pounds for increasing the distance travelled by 1 mile

  • The y-intercept is the value of the y-coordinate at the point where the line crosses the y-axis

    • This can be read off the graph

  • The y-intercept of the line of best fit tells you the value of the y-axis variable when the x-axis variable is equal to zero

    • This needs to be interpreted in context

      • For example if the x-axis variable is distance travelled in a taxi (in miles) and the y-axis variable is the cost of the taxi ride (in pounds £)

      • then the y-intercept of the line of best fit tells you the 'flat fee' that is added onto every taxi ride

Exam Tip

  • Sliding a ruler around a scatter graph can help to find the right position for the line of best fit!

  • Remember to draw the line through the double mean point if the question mentions it

Worked Example

Sophie wants to know if the price of a computer is related to the speed of the computer.

She tests 8 computers by running the same program on each, measuring how many seconds it takes to finish.

Sophie's results are shown in the table below.

Price (£)

320

300

400

650

220

380

900

700

Time (secs)

3.2

5.3

4.1

2.9

5.1

4.3

2.6

3.8

(a) Draw a scatter diagram showing these results.

Plot each point carefully using crosses 

A scatter diagram drawn from the data in the question

(b) Write down the type of correlation shown and interpret this in the context of the question. 

The shape formed by the points goes from top left to bottom right (negative gradient), so there is negative correlation
As one quantity increases (price), the other decreases (time)
Note that time decreasing means that the computer is running faster

The graph shows a negative correlation
This means that the more a computer costs, the quicker it is at running the program

(c) Use a line of best fit to estimate the price of a computer that completes the task in 3.4 seconds.

First draw a line of best fit, by eye
Then draw a horizontal line from 3.4 seconds to the line of best fit
Draw a vertical line down to read off the price

A line of best fit drawn on a scatter diagram

A computer that takes 3.4 seconds to run the program should cost around £620

A range of different answers would be accepted, depending on the line of best fit

(d) Explain why this should not be used to estimate the time taken to complete the task by a computer costing £1500.

£1500 is outside the range of the data, so estimating that from the scatter diagram would be extrapolation

Using the diagram for a computer costing £1500 would be extrapolation, and results from extrapolation are usually unreliable

Regression Lines

What is a regression line?

  • Statistical software can calculate the equation for an 'ideal' line of best fit

    • This 'ideal' line of best fit is known as a regression line

      • It is more accurate than a line of best fit drawn by eye

    • You do not need to calculate the equation for a regression line

      • It will be given to you in the question

      • You need to be able to use and interpret it

  • The equation of a regression line will be given in the following form

    • y equals a plus b x

      • a is the y-intercept of the regression line

      • b is the gradient of the regression line

      • Both of those have the same meaning that they do for any line of best fit

  • You may be asked to draw a regression line onto a scatter diagram

    • You need to know two points on the line

      • Choose two x values (they don't need to correspond to any data values!)

      • Substitute into the equation of the regression line to find the corresponding y values

    • Plot those two points on the scatter diagram and draw a straight line through them

      • Use a ruler!

  • A regression line drawn from its equation will always go through the double mean point for the data set

    • You may be required to use this fact in an exam question

Exam Tip

  • Be careful with the y equals a plus b x form of the regression line

    • It is slightly different from the y equals m x plus c version of a straight line equation that you might be familiar with

  • Remember that the regression line always goes through the double mean point

Worked Example

Rebecca, a regular jogger, recorded the number of calories she was able to burn (y calories) by running different distances (x km). This data is shown on the scatter diagram below.

A scatter diagram showing data for calories burned against distance run

The equation of the regression line for the data in the scatter diagram is y equals 18.8 plus 62.2 x

(a) Interpret the number 62.2 in the equation of the regression line in the context of the question.

62.2 is the gradient of the regression line
It tells how much the y-variable changes when the x-variable goes up by 1

It means that for every extra kilometre she runs, she burns 62.2 more calories

(b) Draw the regression line on the scatter diagram.

Find the coordinates of two points on the line and draw the line through these points

when space x equals 0 comma space space y equals 18.8 plus 62.2 open parentheses 0 close parentheses equals 18.8

when space x equals 10 comma space space y equals 18.8 plus 62.2 open parentheses 10 close parentheses equals 640.8


So draw the line through the points (0, 18.8) and (10, 640.8)

The scatter diagram from the question with the regression line drawn on

The mean of the data values for the distance run is 8 km.

(c) Use this information to find the mean of the data values for the calories burned.

Use the fact that the regression line always goes through the double mean point

Draw a vertical line up from 8 on the x-axis until it hits the regression line
Then draw a horizontal line from there until it hits the y-axis

Scatter diagram and regression line, with lines drawn to find the mean of the y values

Read the value off the y-axis (it's a little bit less than 520)

516 calories

Marks would be awarded for a range of answers around that value

You've read 0 of your 0 free revision notes

Get unlimited access

to absolutely everything:

  • Downloadable PDFs
  • Unlimited Revision Notes
  • Topic Questions
  • Past Papers
  • Model Answers
  • Videos (Maths and Science)

Join the 100,000+ Students that ❤️ Save My Exams

the (exam) results speak for themselves:

Did this page help you?

Roger

Author: Roger

Roger's teaching experience stretches all the way back to 1992, and in that time he has taught students at all levels between Year 7 and university undergraduate. Having conducted and published postgraduate research into the mathematical theory behind quantum computing, he is more than confident in dealing with mathematics at any level the exam boards might throw at you.