OCR A Level Maths: Statistics

Topic Questions

2.3 Working with Data

1a
Sme Calculator
3 marks

In a conkers competition the number of strikes required in order to smash an opponent’s conker (and thus win a match) is recorded for 15 matches and are given below.

6 2 9 10 9 12 5  
8 7 5 11 9 17 8 9

Find the median, the upper and lower quartiles, and the interquartile range for the number of strikes required to smash a conker.

1b
Sme Calculator
2 marks

An outlier is defined as any data value that falls either more than 1.5 cross times (interquartile range) above the upper quartile or less than 1.5 cross times (interquartile range) below the lower quartile.

Identify any outliers.

Did this page help you?

2a
Sme Calculator
2 marks

A hotel manager recorded the number of towels that went missing at the end of each day for 12 days.  The results are below.

2 4 1 0 3 4
3.2 9 3 2 4 5

The data value 3.2 is not an outlier but is an error.
Explain why 3.2 is an error and why it should be removed from the data set.

2b
Sme Calculator
3 marks

With the data value 3.2 removed, find the mean and the standard deviation for the number of towels missing at the end of each day.
You may use the summary statistics  n=11straight capital sigma x=37straight capital sigma x squared=181  with the formulae x with bar on top = fraction numerator straight capital sigma x over denominator n end fraction and  σ= square root of fraction numerator straight capital sigma x squared over denominator n end fraction minus open parentheses x with bar on top close parentheses squared end root

2c
Sme Calculator
2 marks

An outlier is defined as any data value lying outside of 2 standard deviations of the mean.  Find any outliers in the data (still excluding 3.2) and justify whether these should be removed from the data set or not.

Did this page help you?

3a
Sme Calculator
1 mark

Joe counts the number of different species of bird visiting his garden each day for a week. The results are given below.

7 8 5 12 9 7 3


Calculate the mean number of different species of bird visiting Joe’s garden.

3b
Sme Calculator
3 marks

Joe continues to record the number of different species of bird visiting his garden each day for the rest of the month and calculates the mean number of different species is 9.25 for the remaining 24 days.

Joe says, using the data from the whole month, he would expect to see 9 different species every day. Explain whether Joe is correct. You must support your answer with clear working.

3c
Sme Calculator
2 marks

Later, Joe notices that one of the values in his data is 8.8.  Explain why this must be an error and justify whether you think this value should be removed from the data set or not.

Did this page help you?

4a
Sme Calculator
3 marks

The cumulative frequency diagram below shows the length of 100 phone calls, in minutes, made to a computer help centre for one morning.q4-easy-2-3-working-with-data-edexcel-a-level-maths-statistics

(i)
Use the cumulative frequency graph to estimate the 10th and 90th percentiles.

(ii)
Find the 10th to 90th interpercentile range.
4b
Sme Calculator
3 marks

In the afternoon, on the same day, the length of another 100 phone calls to the computer help centre were recorded.  The median length of these calls was 15 minutes and the 10th to 90th interpercentile range was 18 minutes.

Compare the location (median) and spread (interpercentile range) of the calls in the morning and the afternoon.

Did this page help you?

5a
Sme Calculator
3 marks

Two geologists are measuring the size of rocks found on a beach in front of a cliff.
The geologists record the greatest length, in millimetres, of each rock they find at distances of 5 m and 25 m from the base of the cliff.  They randomly choose 20 rocks at each distance.  Their results are summarised in the table below.

Distance from cliff base m 25 m
Number of rocks, n 20 20
straight capital sigma x 3885 2220
S subscript x x end subscript 369 513.75 287 580

Using the formulae  stack x space with bar on top equals fraction numerator straight capital sigma x over denominator n end fraction  and  σ= square root of S subscript x x end subscript over n end root, find the mean and standard deviation for the size of rocks at both 5 m and 25 m from the base of the cliff.

5b
Sme Calculator
2 marks

Compare the location (mean) and spread (standard deviation) of the size of rocks at 5 m and 25 m from the base of the cliff.

5c
Sme Calculator
2 marks

In this instance, an outlier is determined to be any data value that lies outside one standard deviation of the mean (x with bar on top ± σ).

(i)
Find the smallest rock that is not an outlier at 5 m from the base of the cliff.

(ii)
Briefly explain why there cannot be any small rock outliers at 25 m from the base of the cliff.

Did this page help you?

6a
Sme Calculator
3 marks

The table below shows an extract from the large data set for the year 2011.
The figures shown are the number of people travelling to work by train in 7 randomly selected local authorities in the East of England.

Local Authority Number of people travelling
to work by train
Cambridge 2 760
East Hertfordshire 9 383
Great Yarmouth 248
King's Lynn and West Norfolk 976
Stevenage 2 919
Watford 4 897
Waveney 517


Find the median, the upper and lower quartiles, and interquartile range.

6b
Sme Calculator
2 marks

An outlier is defined as any data value that falls either more than
1.5 cross times (interquartile range) above the upper quartile or less than
1.5 cross times (interquartile range) below the lower quartile.

Find the boundaries (fences) at which outliers are defined.

6c
Sme Calculator
2 marks

Explain why, in this case, there are no outliers.

Did this page help you?

1a
Sme Calculator
4 marks

As part of an experiment, 15 maths teachers are asked to solve a riddle and their times, in minutes, are recorded:

8 12 19 20 20
21 22 23 23 23
25 26 27 37 39

An outlier is an observation which lies more than  ±2  standard deviations away from the mean.

Show that there is exactly one outlier.

1b
Sme Calculator
2 marks

State, with a reason, whether the mean or the median would be the most suitable measure of central tendency for these data.

1c
Sme Calculator
2 marks

15 history teachers also completed the riddle; their times are shown below in the box plot:q1c-hard-2-3-working-with-data-edexcel-a-level-maths-statistics Explain what the cross (×) represents on the box plot above. Interpret this in context.

1d
Sme Calculator
4 marks

By comparing the distributions of times taken to complete the riddle, decide which set of teachers were faster at solving the riddle.

Did this page help you?

2a
Sme Calculator
3 marks

Hugo, a newly appointed HR administrator for a company, has been asked to investigate the number of absences within the IT department.  The department contains 23 employees, and the box plot below summarises the data for the number of days that individual employees were absent during the previous quarter.q2-hard-2-3-working-with-data-edexcel-a-level-maths-statistics

An outlier is an observation that falls either more than 1.5 cross times (interquartile range) above the upper quartile or less than 1.5 cross times  (interquartile range) below the lower quartile.

Show that these data have an outlier, and state its value.

2b
Sme Calculator
4 marks

For the 23 employees within the department, Hugo has the summary statistics:

 straight capital sigmax= 286  and  straight capital sigmax2= 4238

Hugo investigates the employee corresponding to the outlier value found in part (a) and discovers that this employee had a long-term illness.  Hugo decides not to include that value in the data for the department.

Assuming that there are no other outliers, calculate the mean and standard deviation of the number of days absent for the remaining employees.

Did this page help you?

3a
Sme Calculator
4 marks

Sam, a zoologist, is a member of a group researching the masses of gentoo penguins.  The research group takes a sample of 100 male and 100 female penguins and records their masses.

An outlier is an observation that falls either more than 1.5 cross times(interquartile range) above the upper quartile or less than 1.5 cross times  (interquartile range) below the lower quartile.

Given that values are outliers if they are less than 4.2kg or more than 8.5kg, calculate the upper and lower quartiles for the mass of the 200 gentoo penguins.

3b
Sme Calculator
5 marks

Casey is another member of Sam’s research group.  She believes that the masses of male and female gentoo penguins follow different distributions.  The cumulative frequency graphs below show the masses of the male and female gentoo penguins in the sample.q3-hard-2-3-working-with-data-edexcel-a-level-maths-statistics

By calculating a measure of central tendency and a measure of variation, compare the two distributions.

Did this page help you?

4a
Sme Calculator
4 marks

Ms Chew is an accountant who is examining the length of time it takes her to complete jobs for her clients.  Ms Chew looks at her spreadsheet and lists the number of hours it took her to complete her last 12 jobs:

9 2 - 6 5 2 - 6 21 5 4 8

‘-’ represents a job for which the length of time taken was not recorded.

An outlier is an observation which lies more than  ±2  standard deviations away from the mean.

By first cleaning the data, show that 21 is the only outlier.

4b
Sme Calculator
3 marks

Ms Chew looks at her handwritten records and finds that the value 21 was typed into the spreadsheet incorrectly.  It should have been 12.

Without further calculations, explain the effect this would have on the:

(i)
mean

(ii)
standard deviation

(iii)
median.

Did this page help you?

5a
Sme Calculator
2 marks

Doris is the owner ofa business in the northeast of the UK and she is considering the option for her employees of working from home if they wish. Doris uses the large data set to investigate the change in proportion of people in employment working from home between 2001 and 2011 in the northeast. Doris has the following information:

  Percentage of people in employment who worked from home in the year...
Local Authority 2001 2011
County Durham 8.3 9.1
Darlington 8.3 8.5
Gateshead 7.0 6.9
Hartlepool 6.5 6.5
Middlesbrough 6.5 6.1
Newcastle upon Tyne 7.0 7.1
North Tyneside 7.2 7.3
Northumberland 10.8 12.2
Redcar and
Cleveland
7.2 7.2
South Tyneside 6.4 6.0
Stockton-on-Tees 7.2 7.3
Sunderland 6.2 6.3

(i)
Calculate the mean of the percentages for the local authorities in 2001 and 2011.
(ii)
Using the large data set, Doris calculates that 7.7% of people in employment in the northeast worked from home in 2001. Explain why this value is different to the value found in (a) (i).
5b
Sme Calculator
4 marks

Outliers are values that are more than 2 standard deviations away from the mean percentages calculated in (a) (i).

Show that Northumberland is an outlier for both years.

5c
Sme Calculator
3 marks

Compare the distributions of the percentages of people in employment working from home in 2001 and 2011.

Did this page help you?

1a
Sme Calculator
2 marks

The lengths of unicorn horns are measured in cm.  For a group of adult unicorns, the lower quartile was 87 cm and the upper quartile was 123 cm.  For a group of adolescent unicorns, the lower quartile was 33 cm and the upper quartile was 55 cm.

An outlier is an observation that falls either more than 1.5 cross times (interquartile range) above the upper quartile or less than 1.5 cross times  (interquartile range) below the lower quartile.

Which of the following adult unicorn horn lengths would be considered outliers?

32 cm 96 cm 123 cm 188 cm
1b
Sme Calculator
2 marks

Which of the following adolescent unicorn horn lengths would be considered outliers?

12 cm 52 cm 86 cm 108 cm
1c
Sme Calculator
2 marks
(i)
State the smallest length an adult unicorn horn can be without being considered an outlier.

(ii)
State the smallest length an adolescent unicorn horn can be without being considered an outlier.

Did this page help you?

2a
Sme Calculator
4 marks

The cumulative frequency diagram below shows completion times for 100 competitors at the 2019 Rubik’s cube championships.  The quickest completion time was 9.8 seconds and the slowest time was 52.4 seconds.q2-medium-2-3-working-with-data-edexcel-a-level-maths-statistics

The grid below shows a box plot of the 2020 championship data.  Draw a box plot on the grid to represent the 2019 championship data. q2a-medium-2-3-working-with-data-edexcel-a-level-maths-statistics

2b
Sme Calculator
3 marks
(i)
Compare the distribution of completion times for the 2019 and 2020 championships.

(ii)
Given that the 2020 championships happened after the global pandemic, during which many competitors spent months at home, interpret your findings from part (b)(i).

Did this page help you?

3
Sme Calculator
7 marks

Students at two Karate Schools, Miyagi Dojo and Cobra Kicks, measured the force of a particular style of hit.  Summary statistics for the force, in newtons, with which the students could hit are shown in the table below:

  bold italic n bold capital sigma bold italic x bold capital sigma bold italic x to the power of bold 2
Miyagi Dojo 12 21873 41532545
Cobra Kicks 17 29520 52330890

(i)
Calculate the mean and standard deviation for the forces with which the students could hit.

(ii)
Compare the distributions for the two Karate Schools.

Did this page help you?

4a
Sme Calculator
4 marks

The heights, in metres, of a flock of 20 flamingos are recorded and shown below:

0.4 0.9 1.0 1.0 1.2 1.2 1.2 1.2 1.2 1.2
1.3 1.3 1.3 1.4 1.4 1.4 1.4 1.5 1.5 1.6


An outlier is an observation that falls either more than 1.5 cross times
(interquartile range) above the upper quartile or less than 1.5 cross times  (interquartile range) below the lower quartile.

(i)
Find the values of Q1, Q2 and Q3.

(ii)
Find the interquartile range.

(iii)
Identify any outliers.
4b
Sme Calculator
3 marks

Using your answers to part (a), draw a box plot for the data.q4b-medium-2-3-working-with-data-edexcel-a-level-maths-statistics

Did this page help you?

5a
Sme Calculator
3 marks

The number of daily Covid-19 vaccinations reported by one vaccination centre over a 14-day period are given below:

237 264 308 313 319 352 378
378 405 421 428 450 465 583


Given that  straight capital sigma
x= 5301  and  straight capital sigmax2= 2 113 195,  calculate the mean and standard deviation for the number of daily vaccinations.

5b
Sme Calculator
2 marks

An outlier is an observation which lies more than  ±2  standard deviations away from the mean.

Identify any outliers for this data.

5c
Sme Calculator
3 marks

By removing any outliers identified in part (b), clean the data and recalculate the mean and standard deviation.

Did this page help you?

6
Sme Calculator
7 marks

The cumulative frequency diagram below shows the distribution of income of 120 managers across a supermarket chain.q6-medium-2-3-working-with-data-edexcel-a-level-maths-statistics

The income of a sample of 120 other employees across the supermarket chain are recorded in the table below.

Income I (£ Thousand) Frequency
0 ≤ I <20 34
20 ≤ I <40 28
40 ≤ I <60 27
60 ≤ I <80 17
80 ≤ I <100 10
100 ≤ I <120 4


On the grid above, draw a cumulative frequency graph to show the data for the other employees and compare the income of managers and other employees.

Did this page help you?

7a
Sme Calculator
2 marks

An extract of data from the large data set on the number of people who work from home in 2011 in each region of England and Wales is given below.

Region Number of Local
Authorities
Total number of people
working from home
North East 12 92 336
North West 39 290 983
Yorkshire and The Humber 21 224 802
East Midlands 40 215 773
West Midlands 30 246 011
East of England 47 304 889
London 33 380 665
South East 67 502 584
South West 37 323 789
Wales 22 142 178


Calculate the mean number of people per Local Authority who work from home.

7b
Sme Calculator
3 marks

Any value more than two standard deviations from the mean can be identified as an outlier. The mean number of people per region who work from home is 272 401 and the standard deviation is 111 488.6 to 1 decimal place.

(i)
Using this definition of an outlier, state which region is an outlier. Fully justify your answer.
(ii)
Explain what other available information may indicate why this region is an outlier.

Did this page help you?

1a
Sme Calculator
2 marks

Marya is consistently late for work. David, Marya’s boss, records the number of minutes that she is late during the next six days. David calculates the mean is 18 minutes and the variance is 210 minutes². On one of the six days, Marya was 50 minutes late.

Show that 50 is an outlier, using the definition that outliers are more than 2 standard deviations away from the mean.

1b
Sme Calculator
2 marks

Marya states that the 50 minutes should not be included as it is an outlier.

(i)
Give a reason why Marya wants the 50 minutes to be excluded from the data set.

(ii)
Give a reason why David wants the 50 minutes to be included in the data set.

1c
Sme Calculator
5 marks

Marya tells David that she was 50 minutes late that day due to a road accident, she shows David the traffic report as evidence.

David agrees to remove the 50 from the dataset, calculate the new mean and standard deviation for the remaining values.

Did this page help you?

2
Sme Calculator
8 marks

For each scenario state, with a reason, whether the identified outlier should be included or excluded in the data set.

(i)
Alice is collecting the ages of children in a school classroom. The outlier is the age of 29.

(ii)
Benji records the times taken for some athletes to run a mile. The outlier is the time of 7 seconds.

(iii)
Carlos is collecting data on the number of hours of sunlight per day for the city, Burrow, located in the north of the North America. The outlier is the value of 23.4 hours.

(iv)

Daisy is collecting data on the heights of cows; the median height is 161cm. The outlier is the height 189cm.

Did this page help you?

3a
Sme Calculator
3 marks

The cumulative frequency graph below shows the information about the lengths of time taken for 80 students to run a lap of the sports hall.q3-very-hard-2-3-working-with-data-edexcel-a-level-maths-statistics

Complete the table below:

Time (t seconds) 20 < t ≤ 40 40 < t ≤ 60 60 < t ≤ 80 80 < t ≤ 100
Frequency 8      
3b
Sme Calculator
3 marks

Hence estimate the mean and the standard deviation of the times.

3c
Sme Calculator
3 marks

Given that the fastest time was 21 seconds and the slowest time was 100 seconds, show that these values are outliers using the definition that an outlier is more than 2 standard deviations away from the mean.

Did this page help you?

4a
Sme Calculator
3 marks

Tim has just moved to a new town and is trying to choose a doctor’s surgery to join, HealthHut or FitFirst. He wants to register with the one where patients get seen faster. He takes of sample of 150 patients from HealthHut and calculates the range of waiting times as 45 minutes and the variance as 121 minutes².

An outlier is defined as a value which is more than 2 standard deviations away from the mean.

Prove that the sample contains an outlier.

4b
Sme Calculator
2 marks

Tim finds out that the outlier is a valid piece of data and decides to keep the value in his sample.

Which pair of statistical measures would be more appropriate to use when using the sample to compare the doctor’s surgeries: the mean and standard deviation or the median and interquartile range? Give a reason for your answer.

4c
Sme Calculator
1 mark

The box plots below show the waiting times for the two surgeries.q4b-very-hard-2-3-working-with-data-edexcel-a-level-maths-statistics

Given that there is only one outlier for HealthHut, label it on the box plot with a cross (×).

4d
Sme Calculator
4 marks

Compare the two distributions of waiting times in context.

Did this page help you?

5a
Sme Calculator
2 marks

Simon, an economist, is investigating the trends in employment rates in London and Wales. The large data set for 2001 does not show the number of people that are not in employment.

Below is an extract from the large data set from 2001:

local authority:
district / unitary
All Categories of people in empIoyment All usual
residents
Newham 86 428 243 891


Using your knowledge of the large data set, explain why there is not enough information in the table above to calculate the number of people that are not in employment.

5b
Sme Calculator
1 mark

Simon wants to compare unemployment between the two regions for each year.

Explain why Simon should use the proportions of unemployed people in each local authority instead of the number of unemployed people.

5c
Sme Calculator
5 marks

Simon calculates the median and interquartile range of the unemployment rates using all of the local authorities in London and Wales for the years 2001 and 2011.

  2001 2011
  Median Interquartile range Median  Interquartile range
London 40.5% 5.7% 34.8% 5.1%
Wales 48.6% 5.6% 39.2% 4.2%


Using the information in the table, compare the unemployment rates for London and Wales and discuss how they have changed between 2001 and 2011.

Did this page help you?