Cookies

We use cookies to improve your experience on our website By continuing to browse the site you are agreeing to our use of cookies.
Our privacy policy

Save My Exams Logo
  • GCSE
  • IGCSE
  • AS
  • A Level
  • O Level
  • Pre U
  • IB
  • Login
  •  
MathsBiologyChemistryPhysicsCombined ScienceEnglish LanguageOther Subjects
GCSE > Maths
Edexcel Topic QuestionsRevision NotesPast PapersPast Papers (old spec)
AQA Topic QuestionsRevision NotesPast Papers
OCR Topic QuestionsRevision NotesPast Papers
GCSE > Biology
Edexcel Topic QuestionsRevision NotesPast Papers
AQA Topic QuestionsRevision NotesPast Papers
OCR Gateway Topic QuestionsRevision NotesPast Papers
CCEA Topic QuestionsPast Papers
GCSE > Chemistry
Edexcel Topic QuestionsRevision NotesPast Papers
AQA Topic QuestionsRevision NotesPast Papers
OCR Gateway Topic QuestionsRevision NotesPast Papers
CCEA Topic QuestionsPast Papers
GCSE > Physics
Edexcel Topic QuestionsRevision NotesPast Papers
AQA Topic QuestionsRevision NotesPast Papers
OCR Gateway Topic QuestionsRevision NotesPast Papers
CCEA Topic QuestionsPast Papers
GCSE > Combined Science
Edexcel Combined: Biology Revision NotesPast Papers
Edexcel Combined: Chemistry Revision NotesPast Papers
Edexcel Combined: Physics Revision NotesPast Papers
AQA Combined: Biology Topic QuestionsRevision NotesPast Papers
AQA Combined: Chemistry Topic QuestionsRevision NotesPast Papers
AQA Combined: Physics Topic QuestionsRevision NotesPast Papers
OCR Gateway Combined: Biology Topic QuestionsRevision Notes
GCSE > English Language
AQA Revision NotesPractice PapersPast Papers
Edexcel Past Papers
OCR Past Papers
GCSE > Other Subjects
AQA English LiteratureBusiness StudiesComputer ScienceEconomicsGeographyHistoryPsychologySociology
Edexcel English LiteratureBusiness StudiesComputer ScienceGeographyHistoryPsychology
OCR English LiteratureBusiness StudiesComputer ScienceEconomicsPsychology
OCR Gateway GeographyHistory
MathsBiologyChemistryPhysicsDouble ScienceEnglish LanguageOther Subjects
IGCSE > Maths
Edexcel Topic QuestionsRevision NotesPast PapersBronze-Silver-Gold Questions
CIE (Extended) Topic QuestionsRevision NotesPast Papers
CIE (Core) Topic QuestionsPast Papers
IGCSE > Biology
Edexcel Topic QuestionsRevision NotesPast Papers
CIE Topic QuestionsRevision NotesPast Papers
IGCSE > Chemistry
Edexcel Topic QuestionsRevision NotesPast Papers
CIE Topic QuestionsRevision NotesPast Papers
IGCSE > Physics
Edexcel Topic QuestionsRevision NotesPast Papers
CIE Topic QuestionsRevision NotesPast Papers
IGCSE > Double Science
Edexcel Double: Biology Topic QuestionsRevision NotesPast Papers
Edexcel Double: Chemistry Topic QuestionsRevision NotesPast Papers
Edexcel Double: Physics Topic QuestionsRevision NotesPast Papers
IGCSE > English Language
CIE Revision NotesPractice PapersPast Papers
Edexcel Past Papers
IGCSE > Other Subjects
CIE English LiteratureBusinessComputer ScienceEconomicsGeographyHistorySociology
Edexcel English LiteratureBusinessComputer ScienceGeographyHistory
MathsBiologyChemistryPhysicsEnglish LanguageOther Subjects
AS > Maths
Edexcel Pure MathsMechanicsStatistics
AQA Pure MathsMechanicsStatistics
OCR Pure MathsMechanicsStatistics
CIE Pure 1Pure 2MechanicsProbability & Statistics 1
Edexcel IAS Pure 1Pure 2MechanicsStatistics
AS > Biology
AQA Topic QuestionsRevision NotesPast Papers
OCR Revision NotesPast Papers
CIE 2019-2021 Topic QuestionsRevision NotesPast Papers
CIE 2022-2024 Topic QuestionsRevision NotesPast Papers
Edexcel IAL Revision Notes
AS > Chemistry
Edexcel Revision Notes
AQA Topic QuestionsRevision NotesPast Papers
CIE 2019-2021 Topic QuestionsRevision NotesPast Papers
CIE 2022-2024 Topic QuestionsRevision NotesPast Papers
Edexcel IAL Revision Notes
AS > Physics
Edexcel Revision Notes
AQA Topic QuestionsRevision NotesPast Papers
OCR Revision NotesPast Papers
CIE 2019-2021 Topic QuestionsRevision NotesPast Papers
CIE 2022-2024 Topic QuestionsRevision NotesPast Papers
Edexcel IAL Revision Notes
AS > English Language
AQA Past Papers
Edexcel Past Papers
OCR Past Papers
AS > Other Subjects
AQA Business StudiesComputer ScienceEconomicsEnglish LiteratureGeographyHistoryPsychologySociology
Edexcel Business StudiesEconomicsEnglish LiteratureGeographyHistoryPsychology
OCR Business StudiesComputer ScienceEconomicsEnglish LiteratureGeographyHistoryPsychologySociology
MathsBiologyChemistryPhysicsEnglish LanguageOther Subjects
A Level > Maths
Edexcel Pure MathsMechanicsStatistics
AQA Pure MathsMechanicsStatistics
OCR Pure MathsMechanicsStatistics
CIE Pure 1Pure 3MechanicsProbability & Statistics 1Probability & Statistics 2
Edexcel IAL Pure 1Pure 2Pure 3Pure 4Mechanics 1Mechanics 2Statistics 1Statistics 2
A Level > Biology
Edexcel Topic QuestionsPast Papers
Edexcel A (SNAB) Revision Notes
AQA Topic QuestionsRevision NotesPast Papers
OCR Topic QuestionsRevision NotesPast PapersGold Questions
CIE 2019-2021 Topic QuestionsRevision NotesPast Papers
CIE 2022-2024 Topic QuestionsRevision NotesPast Papers
Edexcel IAL Topic QuestionsRevision NotesPast Papers
A Level > Chemistry
Edexcel Topic QuestionsRevision NotesPast Papers
AQA Topic QuestionsRevision NotesPast Papers
OCR Topic QuestionsRevision NotesPast PapersGold Questions
CIE 2019-2021 Topic QuestionsRevision NotesPast Papers
CIE 2022-2024 Topic QuestionsRevision NotesPast Papers
Edexcel IAL Topic QuestionsRevision NotesPast Papers
A Level > Physics
Edexcel Topic QuestionsRevision NotesPast Papers
AQA Topic QuestionsRevision NotesPast Papers
OCR Topic QuestionsRevision NotesPast Papers
CIE 2019-2021 Topic QuestionsRevision NotesPast Papers
CIE 2022-2024 Topic QuestionsRevision NotesPast Papers
Edexcel IAL Topic QuestionsRevision NotesPast Papers
A Level > English Language
AQA Past Papers
CIE Past Papers
Edexcel Past Papers
OCR Past Papers
Edexcel IAL Past Papers
A Level > Other Subjects
AQA Business StudiesComputer ScienceEconomicsEnglish LiteratureGeographyHistoryPsychologySociology
CIE BusinessComputer ScienceEconomicsEnglish LiteratureGeographyPsychologySociology
Edexcel Business StudiesEconomicsEnglish LiteratureGeographyHistoryPsychology
OCR Business StudiesComputer ScienceEconomicsEnglish LiteratureGeographyHistoryPsychologySociology
Edexcel IAL English LiteratureGeographyPsychology
CIE IAL History
BiologyChemistryPhysics
O Level > Biology
CIE Topic QuestionsPast Papers
O Level > Chemistry
CIE Topic QuestionsPast Papers
O Level > Physics
CIE Topic QuestionsPast Papers
MathsBiologyChemistryPhysics
Pre U > Maths
CIE Topic QuestionsPast Papers
Pre U > Biology
CIE Topic QuestionsPast Papers
Pre U > Chemistry
CIE Topic QuestionsPast Papers
Pre U > Physics
CIE Topic QuestionsPast Papers
MathsBiologyChemistryPhysics
IB > Maths
Maths: AA HL Topic QuestionsRevision Notes
Maths: AI HL Topic QuestionsRevision Notes
Maths: AA SL Topic QuestionsRevision NotesPractice Papers
Maths: AI SL Topic QuestionsRevision NotesPractice Papers
IB > Biology
Biology: SL Topic QuestionsRevision Notes
Biology: HL Topic QuestionsRevision Notes
IB > Chemistry
Chemistry: SL Topic QuestionsRevision Notes
Chemistry: HL Topic QuestionsRevision Notes
IB > Physics
Physics: SL Topic QuestionsRevision Notes
Physics: HL Revision Notes

Up to 33% off discounts extended!

Ace your exams with up to 33% off our Annual and Quarterly plans for a limited time only. T&Cs apply.


Ok, hide this.

AQA AS Maths: Statistics

Revision Notes

Home / AS / Maths: Statistics / AQA / Revision Notes / 6 Large Data Set / 6.1 Large Data Set / 6.1 Large Data Set


6.1 Large Data Set


Using a Large Data Set

What is a large data set?

  • As part of your course there is a large data set that you can use
  • It contains lots of information
  • You are not expected to memorise any results from the data
  • You will have an advantage if you are familiar with the large data set
    • Understand what the variables are
    • Understand the terminology used
    • Understand the context
  • You will not get a copy of the large data set in your exam
    • if you are required to calculate anything using the large data set you will be given an extract within the question

What skills can I practice with a large data set?

  • Cleaning data
    • There might be missing data
    • You could identify outliers and question their validity
  • Sampling and hypothesis testing
    • You can practice different methods of sampling using the data
    • You could use a sample to test a hypothesis
  • Statistical measures and diagram
    • You could calculate summary statistics for different variables
    • You could create different diagrams
    • You can interpret the summary statistics and diagrams (as it is real data you could explore the context behind the results)
    • You could compare summary statistics and diagrams

Do I have to use spreadsheets and other technology?

  • You will not be assessed on using spreadsheets
    • However, it is a useful skill for your future career
  • You could use technology to calculate the summary statistics and create the statistical diagrams
    • This will help you to practice these skills whilst using real data
    • Spreadsheets can calculate summary statistics
    • In the exam you could use the statistics mode on your calculator

Summary of the Large Data Set

What is the data about?

  • The large data set for AQA comes from the UK Department Stock Vehicle Database (loosely referred to as “Cars” or “Vehicle data”)
    • The full database is too large to use in full so AQA have extracted some of the data into a spreadsheet and this is what should be used to study parts of the statistics course
  • Some of the data in the spreadsheet is coded so keep a close eye on the information contained under “Definition of fields” and “Field Values”
    • Beware! As the codes are numbers this may look like you can find statistics with them like the mean, but this would not make sense

e.g. “The mean of the propulsion type data is 2 so the mean propulsion type is diesel” does not make sense but it may be okay to say “diesel is the modal (most frequent) propulsion type of vehicles in the sample”

  • You are likely to be asked to “use your knowledge of the large data set” – this is where the familiarity of its key features can be an advantage
    • e.g. knowing that the mass of a vehicle includes an average 75 kg driver
  • Only mention things that can be justified from the dataset
    • e.g. knowing there is only one electric vehicle in the whole data set so don’t use or assume things you may have heard about electric cars on the news recently

What variables are included in the large data set?

  • Reference
    • A unique number given to each individual vehicle by AQA to index the data
    • Could be used to easily identify a vehicle and all its information
  •  The first few pieces of data about the vehicles are qualitative
  •  Make
    • Only the five most frequently registered makes are included
    • BMW, Ford, Toyota, Vauxhall and Volkswagen
  • PropulsionTypeid
    • A data value of 1, 2, 3, 7 or 8 indicates the type of fuel powering the vehicle
      (4, 5 and 6 are not used in the AQA extracted dataset)
    • 1 is petrol-powered, 2 is a diesel-powered vehicle
    • The full codes are listed under “Field Values”
  • BodyTypeid
    • Also given by coded values defined in “Field Values” these represent the style of vehicle including (amongst others) convertibles and MPVs (multi-purpose vehicles)
  • GovRegion
    • The database only includes cars registered in England (rather than the UK)
    • The region of a vehicle is determined by the postcode of the current registered keeper
    • The regions included are London, North West and South West
  • KeeperTitleid
    • The last of the coded values defined in “Field Values” represents whether the current registered keeper is male, female, a company or unknown
  •  The remaining data values are all quantitative
  •  Engine size
    • Size (capacity) of the engine measured in cubic centimetres (cc)
  • Year registered
    • Vehicles included in the extract were either first registered in 2002 or 2016
    • The introduction says the precise dates are
      • 3 June 2002 – 9 June 2002
      • 6 June 2016 – 12 June 2016
    • Knowing that only a few days from each year are included gives an idea of the enormity of the full database
  • Mass
    • Measured in kilograms (kg)
      • the mass of an average driver (75 kg) is included in the figures quoted
  • Emissions
    • The remaining data values centre around the emissions from the vehicles
      • CO2 – Carbon dioxide emissions, measured in g/km
      • CO – Carbon monoxide emissions measured in g/km
      • NOX – Oxides of nitrogen emissions measured in g/km
      • part – Particulate emissions measured in g/km
        (this measure only applies to diesel cars)
      • hc – hydrocarbon emissions measured in g/km
  • Random number
    • A random number is generated by the spreadsheet for each vehicle so is not part of the data set but can be used to randomly select vehicles in sampling
    • Be aware that the random number refreshes each time the spreadsheet is refreshed

Is the data complete?

  • Various data values are blank within the spreadsheet; others are 0 where this makes no sense (such as the mass of the car)
    • There is no information as to why these occur but be aware they exist
    • Under the “Definition of fields” tab there is some extra information about the emissions data
      • CO2 emissions are known for 83% of vehicles in the whole database
      • CO emissions are known for 82% of vehicles in the whole database
      • NOX emissions are known for 81% of vehicles in the whole database
      • Part – only for diesel vehicles (24% of the whole database)
      • Hc emissions are known for 51% of vehicles in the whole database
  • The above means that the data should be cleaned before samples are taken

What are the key features I need to know about the data set?

  • These have been mentioned in the lists above but here is a summary of those we have seen used in exam and practice papers
    • There are only five makes, and Ford was the most frequently registered
    • There is only one electric vehicle in the database
    • Data is from a few days in summer and only in two years – 2002 and 2016
    • The mass of a vehicle includes an average 75 kg driver
    • Emissions data (CO2, CO and NOX) is only known for around 80% of the whole database
    • Particulate emissions are only applicable to diesel cars

Worked Example

Jay collects data on the masses of vehicles first registered in 2002 taking a random sample of size 30.

(a)
Use your knowledge of the large data set to explain why Jay should clean the data before taking a sample

 

(b)
Jay’s calculations show the mean mass of a vehicle in his sample is 1340 kg.
Using your knowledge of the large data set write down an estimate for the mean mass of an empty vehicle in the whole database, justifying your answer.
(a)
Use your knowledge of the large data set to explain why Jay should clean the data before taking a sample

6-1-1-aqa-we-solution-part-1

(b)
Jay’s calculations show the mean mass of a vehicle in his sample is 1340 kg.
Using your knowledge of the large data set write down an estimate for the mean mass of an empty vehicle in the whole database, justifying your answer.

6-1-1-aqa-we-solution-part-2

Exam Tip

  • As vehicle emissions are frequently mentioned in news articles be wary of confusing popular opinion with what can be justified using the information contained within the large data set.


  • 1. Statistical Sampling
    • 1.1 Sampling & Data Collection
      • 1.1.1 Sampling & Data Collection
    • 2. Data Presentation & Interpretation
      • 2.1 Statistical Measures
        • 2.1.1 Basic Statistical Measures
          • 2.1.2 Frequency Tables
            • 2.1.3 Standard Deviation & Variance
            • 2.2 Data Presentation
              • 2.2.1 Data Presentation
                • 2.2.2 Box Plots & Cumulative Frequency
                  • 2.2.3 Histograms
                  • 2.3 Working with Data
                    • 2.3.1 Outliers & Cleaning Data
                      • 2.3.2 Intrepreting Data
                        • 2.3.3 Skewness
                        • 2.4 Correlation & Regression
                          • 2.4.1 Correlation & Regression
                        • 3. Probability
                          • 3.1 Basic Probability
                            • 3.1.1 Calculating Probabilities & Events
                              • 3.1.2 Venn Diagrams
                                • 3.1.3 Tree Diagrams
                              • 4. Statistical Distributions
                                • 4.1 Probability Distributions
                                  • 4.1.1 Discrete Probability Distributions
                                  • 4.2 Binomial Distribution
                                    • 4.2.1 The Binomial Distribution
                                      • 4.2.2 Calculating Binomial Probabilities
                                    • 5. Hypothesis Testing
                                      • 5.1 Hypothesis Testing
                                        • 5.1.1 Hypothesis Testing
                                        • 5.2 Hypothesis Testing (Binomial Distribution)
                                          • 5.2.1 Binomial Hypothesis Testing
                                        • 6 Large Data Set
                                          • 6.1 Large Data Set
                                            • 6.1 Large Data Set


                                            DOWNLOAD PDF

                                          Author: Paul

                                          Paul has taught mathematics for 20 years and has been an examiner for Edexcel for over a decade. GCSE, A level, pure, mechanics, statistics, discrete – if it’s in a Maths exam, Paul will know about it. Paul is a passionate fan of clear and colourful notes with fascinating diagrams – one of the many reasons he is excited to be a member of the SME team.


                                          Save My Exams Logo
                                          Resources
                                          Home Join Support

                                          Members
                                          Members Home Account Login

                                          Company
                                          About Us Contact Us Jobs Terms Privacy Facebook Twitter

                                          Quick Links
                                          GCSE Revision Notes IGCSE Revision Notes A Level Revision Notes Biology Chemistry Physics Maths 2022 Advance Information

                                           
                                          © Copyright 2015-2022 Save My Exams Ltd. All Rights Reserved.
                                          IBO was not involved in the production of, and does not endorse, the resources created by Save My Exams.