Population & Sampling (Edexcel GCSE Statistics)

Revision Note

Roger

Author

Roger

Expertise

Maths

Population & Sample Types

What are populations, samples and sampling frames?

  • The population refers to the whole set of things which you are interested in

    • e.g. if a vet wanted to know how long a typical French bulldog sleeps for in a day

      • then the population would be all the French bulldogs in the world

    • Be careful - the word 'population' can mean different things in different contexts

      • e.g. 'the population of the UK' is usually used to refer to everyone in the UK

      • But if you're studying UK dentists then the 'population' for your study would be restricted to all the dentists in the UK

  • A sample refers to a subset of the population which is used to collect data from

    • e.g. out of all the French bulldogs in the world (the population)

      • a vet might take a sample of French bulldogs from different cities and record how long they sleep in a day

  • A sampling frame (or sample frame) is a list of all members of the population

    • For example, a list of employees’ names within a company

    • Not every population will have an easily-accessible sampling frame

What's the difference between a census and a sample?

  • A census collects data about all the members of a population

    • e.g. the government in the UK does a national census every 10 years to collect data about every person living in England at the time

    • The main advantage of a census is that it gives fully accurate results

    • The disadvantages of a census are:

      • It is time consuming and expensive to carry out

      • It can destroy or use up all the members of a population (imagine a company testing every single firework it produces)

  • Sampling is used to collect data from a subset of the population

    • The advantages of sampling are:

      • It is quicker and cheaper than a census

      • It leads to less data needing to be analysed

    • The disadvantages of sampling are:

      • It might not represent the population accurately

      • It could introduce bias, if some parts of the population are more represented in the sample than others

What different sampling techniques do I need to know?

Random sampling methods

  • Simple random sampling: here every member of the population has an equal probability of being selected for the sample

    • To select a simple random sample of begin mathsize 16px style n end style members of the population

      • Uniquely number every member of the population

      • Then randomly select begin mathsize 16px style n end style different numbers using a random number generator (or other form of random selection)

  • Stratified sampling: the population is divided into separate groups (called strata) and then a random sample is taken from each group (stratum)

    • The proportion of a sample that belongs to a stratum is equal to the proportion of the population as a whole that belongs to that stratum

      • e.g. if 1/20 of the population belongs to a particular stratum

      • then 1/20 of the sample should come from that stratum

    • A population could be split into strata by age ranges, gender, occupation, etc.

  • See the spec points on 'Random Samples' and 'Stratified Samples' for more info on these two methods

Non-random sampling methods
Note: some of these methods include random elements, but the samples as a whole are not random

  • Judgement sampling: here you simply use your judgement to choose a sample of the population

    • You should attempt to make sure that the sample is representative of the population as a whole

  • Opportunity (convenience) sampling: a sample is formed using available members of the population who fit the study criteria

    • e.g. for a study of UK consumers you could stand on a street corner and interview the first 50 people who walk by

  • Cluster sampling: the population is divided into sensible 'clusters' and then a number of clusters are chosen at random to form the sample

    • e.g. a study of UK education might use schools as the clusters

    • then select 50 schools at random and use the people in those schools as the sample

  • Systematic sampling: a sample is formed by choosing members of a population at regular intervals using a list (sampling frame)

    • e.g. to select 1/10 of the students in a school as a sample

      • Start with a list of all students

      • Select one student at random as a 'starting point'

      • Then also select every 10th student on the list after that starting point

      • (If necessary, wrap back around to the start of the list when you get to the end)

  • Quota sampling: the population is split into groups (like in stratified sampling) and a quota is specified for each group

    • The quota specifies how many members of the population are to be selected from each group

      • This will often be done in the same way as selecting the sizes of the strata in stratified sampling

      • Or other criteria could be used to set the quota for each group

    • Members of the population are selected until each quota is filled

      • If a member does not want to be included then another member is chosen instead

    • The members do not have to be selected randomly

What are the advantages and disadvantages of different sampling techniques?

  • In general

    • Most sampling techniques can be improved by taking a larger sample

    • You want to minimise the bias within a sample

      • This occurs when the sample is not representative of the population

      • The best way to do avoid bias (when possible) is to use a random method

    • Sometimes the 'best' method would cost too much or take too much time

      • So you need to choose the 'best method you can afford (or have the time for)'

    • A sample only gives information about the members in the sample

      • A different sample from the same population could lead to different conclusions about the population!

  • Simple random sampling:

    • This is the best sampling method for avoiding bias

      • Although it is possible that members of some groups in the population will not be represented in the sample

      • To avoid this stratified sampling can be used instead

    • Most useful when you have a small population or want a small sample

      • e.g. children in a class

    • This cannot be used if it is not possible to number or list all the members of the population

      • e.g. the fish in a lake

  • Stratified sampling:

    • This should be used when the population can be split into obvious groups

    • Useful when there are very different groups of members within a population

    • The sample will be representative of the population structure

      • Members of every group (stratum) are guaranteed to be included in the sample

    • The members selected from each stratum are chosen randomly

      • This helps to avoid bias

    • This cannot be used

      • if the population cannot be split into groups

      • or if the groups overlap

  • Systematic sampling:

    • This is useful when you want a sample from a large population

    • You need access to a sampling frame (list of the population)

      • If the order of the sampling frame is random then the sample will also be random

    • This cannot be used if it is not possible to number or list all the members of the population

      • e.g. penguins in Antarctica

    • Be careful of periodic (i.e. regularly recurring) patterns in the sampling frame

      • e.g. a list of names where the names are grouped by 5-person teams with the team captain appearing first

      • If you selected every 5th name in the list you would end up with either all captains or no captains in your sample

  • Quota sampling:

    • This is useful when a small sample is needed to be representative of the population structure

    • Useful when collecting data by asking people who walk past you in a public place or when a sampling frame is not available

      • Just keep asking people until the quota is filled for each group

    • This can introduce bias as some members of the population might choose not to be included in the sample

  • Cluster sampling:

    • This will usually require less time and be less expensive than simple random sampling or stratified sampling

      • e.g. if your clusters are schools, you will only need to collect data from the people in some of those schools

      • instead of having to collect data from a few people in every school in the country

    • However the clusters may not be representative of the population structure as a whole

      • This can make the sample biased

  • Opportunity (convenience) sampling:

    • This should be used when a sample is needed quickly

    • Useful when a list of the population is not possible

    • But the sample is unlikely to be representative of the population structure

      • This can make the sample biased

  • Judgement sampling:

    • This can be used when a sample is needed quickly

    • The person choosing the sample should try to make it representative of the population

      • But intentionally or unintentionally the sample can end up being biased

      • Therefore this is rarely a preferred method

Worked Example

Aaron, Belinda and Charlotte are writing an article about school uniforms for their school newsletter. They want to interview a sample of 30 students to find out their opinions about school uniforms.

(a) Write down the population for the survey, and suggest a possible sampling frame.

Be careful with the population here
They only want to interview students, so the population for their survey is only the students in the school
It does not include teachers or other staff members

The population is all the students in the school.
A sampling frame could be an alphabetical list of all the students in the school.

Aaron suggests that he could stand by the school gates in the morning and interview the first 30 students that come past him.

(b) Name this type of sampling and suggest a possible disadvantage.

Opportunity sampling

The sample could be biased. For example, Aaron could end up interviewing all people who have just arrived on the same bus, or groups of friends or siblings arriving at school together.

Belinda suggests that instead Aaron should interview students at the school gates until he has interviewed exactly 6 students from each of the school's year groups (years 7 through 11).

(c) Name this type of sampling and suggest a reason why it would be an improvement over Aaron's original plan.

Quota sampling

The sample would probably be more representative of all the students in the school, because it would be certain to include students from each year group.

In the end, Aaron, Belinda and Charlotte decide to use systematic sampling to select their sample.

(d) Given that there are 480 students in the school, suggest how they might go about choosing their sample.

They are going to need to select names from a list
But first we need to know what proportion of the students in the school they want to interview
Divide the number in their sample (40) by the total number of students (480)

40 over 480 equals 4 over 48 equals 1 over 12

So they want their sample to contain 1/12 of the students in the school
This means they need to choose every 12th name in the list (after the random starting point)

They will need a list of all the students in the school to use as a sampling frame.

They need to randomly select one student from the list as a starting point, then also select every 12th student from the list after that.

They may need to 'wrap back around' to the start of the list to get all 40 names for their sample.

Random Samples

What do I need to know about random sampling?

  • In a simple random sample every member of the population has an equal probability of being selected for the sample

    • This means that the sample selection is fair and unbiased

      • Therefore the sample is likely to be representative of the population

    • To minimise bias this will usually be the best method

      • But it can also be expensive and time-consuming

      • And some groups in the population may end up not being represented in the sample

  • Some other sampling methods also include random selection

    • In stratified sampling, the members of the population chosen from each stratum (group) are chosen randomly

      • A 'simple random sample' is taken from each stratum

      • This leads to relatively unbiased samples that also reflect the population structure

    • In cluster sampling, the clusters to include in the sample are selected randomly

      • This can give good results if the clusters are representative of the population as a whole

    • In systematic sampling the 'starting point' member in the population list is chosen randomly

      • This is not considered a 'random sample' unless the ordering of the list is also random

How is a random sample selected?

  • To take a simple random sample you need to have access to a list of all members of the population (i.e. a sampling frame)

    • Every member in the sampling frame must be assigned a number

      • Usually this will mean starting at 1

      • and numbering the rest of the list in order: 2, 3, 4, etc.

  • To select a random sample of n members of the population

    • n random numbers must be generated

    • The members in the list with those numbers are then selected for the sample

  • You should be familiar with the different options for choosing random numbers

  • Random numbers can be selected from a random number table

    • The numbers in the table may have more digits than you need

      • e.g. you want 2-digit numbers but the table shows
        469066      155387      172419      953505

    • In this case you can break the table numbers into smaller numbers

      • So here read the numbers in the table as
        46    90    66      15    53    87      17    24    19      95    35    05

      • '05' is just a two-digit way of writing '5'

  • You could use a random number generator on a calculator

    • This may give you random 3-digit decimals between 0 and 1

      • e.g. 0.541, 0.414, 0.929

      • These can be multiplied by 1000 to give you integer answers: 541, 414, 929

    • Or you may be able to ask for random integers between two values

      • e.g. a random integer between 1 and 6

      • This would be just like rolling a fair 6-sided dice

  • Apps on a computer or online can also generate random numbers

    • These apps usually let you specify what you want

      • how many numbers

      • between what values

  • You can select random numbers by rolling dice

    • A fair 10-sided dice can give values between 0 and 9

    • Use two 10-sided dice (or roll one dice twice) to get numbers between 0 and 99

      • i.e., one dice for the tens (10, 20, 30, ...) and one dice for the units (1, 2, 3, ...)

    • Or use three 10-sided dice to get numbers between 0 and 999

      • i.e., one for hundreds, one for tens, and one for units

  • You could also put all the numbers into a hat (or bag, etc.)

    • And draw numbers out at random

      • This is less easy to do with a lot of numbers!

    • Similarly, numbers might be drawn at random from a deck of cards with the numbers written on them

  • You should know how to deal with problems that occur when choosing random numbers

  • You may get a random number that does not match any of the items in your list

    • e.g. if you have 80 items in a list numbered 1 to 80

      • but get the random number 93

    • If this happens, simply ignore any numbers that don't match

      • and keep generating random numbers until you have enough that do match items in the list

  • You may get a random number that occurs more than once

    • In this case keep the first version of the number

      • and ignore any repeated versions

    • Keep generating random numbers until you have enough unique numbers

      • i.e. ones that only occur one time each

Worked Example

Florence has a list of her company's 832 customers. She would like to choose a simple random sample of 12 customers to survey about some new changes she was thinking of making to the company website.

(a) State an advantage of using simple random sampling to chose the 12 people to interview.

Using simple random sampling means every customer has an equal chance of being chosen. This should minimise possible bias in the sample.

Florence finds the following list of random numbers in a table of random numbers.

855737 space space space space space 648311 space space space space space 989903 space space space space space 068440 space space space space space 922412 space space space space space 748392

445546 space space space space space 862885 space space space space space 418648 space space space space space 010910 space space space space space 148805 space space space space space 533291

927476 space space space space space 920027 space space space space space 688416 space space space space space 013932 space space space space space 766179 space space space space space 811230

(b) Explain how Florence can use her customer list along with those random numbers to select the 12 customers to interview. In using the numbers from the table, you should start at the top left and work across from left to right.

First Florence will need to prepare her sampling frame (i.e. her customer list)

Florence should start by numbering the customers in her list from 1 to 832.

To get random numbers up to 832 Florence will need random 3-digit numbers
To do this she can think of each number in the table as being two separate 3-digit numbers:

855 737  648 311 989 903 068 440 922 412 748 392

445 546 862 885 418 648 010 910 148 805 533 291

927 476 920 027 688 416 013 932 766 179 811 230

Starting with the first row, ignore any numbers that are greater than 832:

855 737 648 311 989 903 068 440 922 412 748 392

Use the numbers that are left
Note that '068' is the 3-digit version of 68

From the first row: 737, 648, 311, 68, 440, 412, 748, 392

That's 8 numbers, so she needs 4 more
Continue with the second row:

445 546 862 885 418 648 010 910 148 805 533 291

The first 4 numbers are 445, 546, 418 and 648
But 648 has been chosen already
So ignore that and use the next number instead ('010'=10)

From the second row: 445, 546, 418, 10

From the numbered customer list she should choose the customers with the following numbers for her sample:

737, 648, 311, 68, 440, 412, 748, 392, 445, 546, 418, 10

Charlotte could also, for example, have just used the first 3 digits in each of the numbers in the table. This would have given her the following numbers for her sample:

648, 68, 748, 445, 418, 10, 148, 533, 688, 13, 766, 811

Stratified Samples

How is a stratified sample selected?

  • To take a stratified sample, the population must first be divided into a number of groups (strata)

    • Every member of the population must belong to one group

    • No member of the population can belong to more than one group

      • i.e. the groups cannot overlap

    • The strata could be based on age ranges, gender, occupation, etc.

  • The number of members chosen from each stratum corresponds to the proportion of the population that belongs to that stratum

    • e.g. if 1/20 of the population belongs to a particular stratum

      • then 1/20 of the sample will be chosen from that stratum

  • To find the number to be chosen from each stratum use the formula:

    • number space from space stratum equals fraction numerator size space of space stratum over denominator size space of space population end fraction cross times size space of space sample

      • i.e. divide the size of the stratum by the size of the population

      • then multiply by the size of the sample

      • the size of the stratum just means how many total members are in the stratum

  • Once you know how many members to choose from each stratum

    • those members should be chosen randomly from all the members of the stratum

    • i.e. take a 'simple random sample' of the correct size from each stratum

How do I choose a stratified sample based on more than one category?

  • It is possible that the strata to be used will be based on more than one category

  • For example the 900 people working for a large company

    • could be divided into managers and employees

    • but could also be divided according to whether they usually walk or bike to work, drive to work, or use public transport

walk or bike

drive

public transport

employees

180

225

414

managers

27

45

9

  • In a case like this each stratum for a sample could be based on two categories

    • e.g. 'employees who drive to work', 'managers who use public transport', etc.

  • Once the strata have been decided, find the number in each stratum in the usual way

    • e.g. if you want a total sample of 100 people (out of the 900 people in the table)

      • the number of employees who walk or bike in the sample would be

        space space space space space space space space space space space 180 over 900 cross times 100 equals 20

      • the number of managers who walk or bike in the sample would be

        space space space space space space space space space space space 27 over 900 cross times 100 equals 3

      • etc.

Exam Tip

  • After you calculate the numbers to be chosen from each stratum

    • add them up and make sure they equal the total size of the sample you were looking for

    • This is a good way to spot possible mistakes in your working

Worked Example

In Dafydd's school there are 636 students, 36 teachers, and 48 non-teaching staff. For a research project he is working on, Dafydd wishes to choose a stratified sample of 60 people from the students and staff at the school.

Calculate the numbers of students, teachers and non-teaching staff that Dafydd should include in his sample.

First we need to find the total number of people in the population
(Here the population is all the students and staff in the school)

636 plus 36 plus 48 equals 720

To find the number from each group use number space from space stratum equals fraction numerator size space of space stratum over denominator size space of space population end fraction cross times size space of space sample

students colon space space 636 over 720 cross times 60 equals 53

teachers colon space space 36 over 720 cross times 60 equals 3

non minus teaching space staff colon space space 48 over 720 cross times 60 equals 4

Check to make sure those numbers add up to 60:
53+3+4=60

53 students, 3 teachers and 4 non-teaching staff

You've read 0 of your 0 free revision notes

Get unlimited access

to absolutely everything:

  • Downloadable PDFs
  • Unlimited Revision Notes
  • Topic Questions
  • Past Papers
  • Model Answers
  • Videos (Maths and Science)

Join the 100,000+ Students that ❤️ Save My Exams

the (exam) results speak for themselves:

Did this page help you?

Roger

Author: Roger

Roger's teaching experience stretches all the way back to 1992, and in that time he has taught students at all levels between Year 7 and university undergraduate. Having conducted and published postgraduate research into the mathematical theory behind quantum computing, he is more than confident in dealing with mathematics at any level the exam boards might throw at you.