AQA A Level Psychology

Revision Notes

7.3.5 Data Handling

Test Yourself

Data Handling

What is Primary Data?

  • Primary data is collected at source e.g. the data obtained from running an experiment, conducting a questionnaire etc.
  • Primary data refers specifically to the research aim e.g. Loftus & Palmer (1974) collected data in the form of speed estimates based on their manipulation of key verbs to test the reliability of eyewitness testimony
  • Primary data is 'fresh' - it has not been previously published 
  • Primary data may be more reliable and valid than secondary data as the researcher has full control over how the data is collected (although this will, of course, depend on how skilled, conscientious and careful the researcher has been throughout the research process)

What is Secondary Data?

  • Secondary data has not been collected at source; it is not original data 
  • Secondary data is that which has been obtained by other researchers who will each have been working to achieve a their own specific aim 
  • Secondary data is not 'fresh' - it has been previously published 
  • Secondary data allows a non-interested researcher (meaning they were not involved in the original research process) to gain a clear picture of the topic as they are using data derived from multiple sources 

What is a Meta-Analysis?

  • A meta-analysis uses secondary data on the quantitative findings of a set of already-published studies
  • The researcher conducts a statistical analysis of these findings (e.g. from lab experiments, correlational studies, questionnaires)
  • Researchers combine the findings from these multiple studies to draw an overall conclusion about the topic in question e.g. the effectiveness of CBT on anxiety disorders
  • The results of a meta-analysis are expressed in terms of effect size: the strength of the relationship between two variables on a numeric scale e.g. the effect size for CBT as a treatment for anxiety is 0.92 which is a large effect size
  • Conducting a meta-analysis means that bias is reduced as the researcher has not personally conducted the original research i.e. they have nothing to lose, they are merely reporting on general effects
  • Reliability of meta-analyses should be high as a large number of studies can be analysed statistically, leading to robust data (the downside of this is, of course, that the researcher has no idea as to how well controlled each of the studies was in the first place)
  • A meta-analysis allows for trends to be identified by combining the data of lots of smaller studies; such trends would not be identifiable on a one-by-one basis.

What is Quantitative Data?

  • Quantitative data is data in the form of numbers e.g. 53 out of 125 participants answered 'yes' to this question; 89% of participants were slower in condition A; there is a -0.4 correlation coefficient in this study
  • Quantitative data can be transformed into tables, graphs, charts, percentages, fractions etc.
  • Quantitative data can be statistically analysed using descriptive statistics (e.g. mean, mode, range) and inferential statistics e.g. Mann-Whitney test, Spearman's rho, related-t test
  • Research methods that tend to generate quantitative data include experiments, observations, correlations, and questionnaires/surveys
  • Quantitative data tends to be reliable as it is easy to analyse and compare and the techniques used to collect it tend to be replicable

What is Qualitative Data?

  • Qualitative data is data in the form of words e.g. thoughts, feelings, attitudes, ideas, beliefs
  • Qualitative research methods/techniques are interviews (individual or focus group), diary entries, thematic analysis, naturalistic observations, open-ended questions
  • Qualitative data allows researchers to gain insight into the nature of individual experience and meaning
  • Qualitative data lacks reliability due to the small sample sizes and the subjectivity of the data but it is high in validity as it communicates and analyses real experience, opinion, feelings etc.

What are Measures of Central Tendency?

  • Measures of central tendency - in the form of descriptive statistics - describe the central or typical value of a data set
  • Measures of central tendency are used to summarise large amounts of data into typical mid-point scores e.g. the average score of a data set
  • There are 3 measures of central tendency: the mean, the mode and the median
  • Mean
    • The mean calculates the average score of a data set 
    • The mean indicates what a researcher would expect to find (as the average score) if they were to replicate the procedure of a given study
    • The mean is calculated using the total score of all the values in the data set divided by the number of values in that set 

      For example, to calculate the mean of 4, 6, 7, 9 the researcher would add up the values and then divide this total by the number of values as follows:

      4 + 6 + 7 + 9 = 26

      26 ÷ 4 = 6.5

      The mean = 6.5

    • Advantages of using the mean
      • It is the most sensitive measure of central tendency as it takes all scores in the data set into account
      • It is more likely than other measures of central tendency to provide a representative score i.e. a reliable result  
    • Disadvantages of using the mean
      • It is sensitive extreme scores (outliers) so it can only be used when the scores are reasonably close 
      • The mean score may not be represented in the data set itself, as above, the mean is 6.5 which does not actually appear in the original data set
  • Mode
    • The mode calculates the most frequently occurring score in a data set i.e. mode = most often
    • The mode simply highlights what the most common score(s) is in a data set (some data sets will have no mode, some will have more than one)
    • The mode is used when the researcher cannot use the mean or the median e.g. when nominal data is used, for example:
      • to calculate the mode of 3, 3, 3, 4, 4, 5, 6, 6, 6, 6, 7, 8 the researcher would count the number of times each individual score appears in the data set as follows:
      • the most frequently occurring number is 6 
      • the mode = 6
    • Advantages of using the mode
      • It is less likely to be affected by extreme scores
      • It often useful for the analysis of qualitative data as this type of data may require frequencies of theme to be analysed 
    • Disadvantages of using the mode
      • A data set may include two modes (bimodal) or more (multi-modal) which blurs the meaning of the data
      • The mode is likely to be of little use on small data sets as it may provide an unrepresentative central measure
  • Median
    • The median calculates the middles value of a data set (the positional average)
    • The median indicates what is the exact middle point (the data point) in a data set
    • The data has to be arranged into numerical order first (with the lowest score at the beginning of the list) 
      • For example, to calculate the median of 20, 43, 56, 78, 92, 67, 48 the researcher must take the half-way point between the two middle values as the data set has an odd number of scores (7)
      • The researcher would then add the two middle values together and divide them by 2, as follows:
      • 20, 43, 56, 78, 92, 67, 48 = ordered into 20, 43, 48, 56, 67,78, 92 = the median is the halfway point between 67 and 78 = 123 divided by 2 = 61.5
      • The median = 61.5 
    • Advantages of using the median
      • It is not affected by extreme scores
      • It is easy to calculate
    • Disadvantages of using the median 
      • It does not necessarily represent a typical average as it does not include all of the data in its calculation i.e. it does not account for extreme scores making it less reliable than the mean
      • It is impractical to use on large data sets

What are Measures of Dispersion? 

  • Measures of dispersion calculate the spread of scores and how much they vary in terms of how distant they are from the mean or median
  • A data set with low dispersion will have scores that cluster around the measure of central tendency e.g. the mean
  • A data set with high dispersion will have scores that are spread apart from the central measure with much variation among them
  • If a data set contained exactly the same score per participant (e.g. everyone scored 15 out of 20 on a memory test) then the dispersion score would be zero as there would be no variation at all in the scores (plus the mean, mode and median would be identical = 15)
  • There are two measures of dispersion: the range and standard deviation 
  • Range 
    • The range describes the difference between the lowest and the highest scores in a data set
    • The range provides information as to the gap between highest and lowest scores
    • To calculate the range the researcher would subtract the lowest value from the highest value in the data set
      • For example, to calculate the range of 4, 4, 6, 7, 9, 9 the researcher would take the lowest number (4) from the highest number (9) as follows:
      • 9 - 4 = 5 
      • The range = 5
    • Advantages of using the range
      • It provides a broad overview of the data which can be useful for some research purposes
      • It is easy to calculate
    • Disadvantages of using the range 
      • It highlights the gap between top and bottom scores but provides no information as to all of the other scores in the data set
      • It is not very stable or representative as it can vary from one sample to another as sample size increases
  • Standard Deviation 
    • Standard deviation calculates how a set of scores deviates from the mean 
    • Standard deviation provides insight into how clustered or spread out the scores are from the mean
    • A low standard deviation indicates that the scores are clustered tightly around the mean which indicates reliability
    • A high standard deviation indicates that the scores are more spread out from the mean which indicates lower reliability
    • Normal distributions have a low standard deviation as it reflects the fact that the scores are clustered close to the mean i.e. it is not skewed 
    • There are six steps to standard deviation 
      • Calculate the mean 
      • Subtract the mean from each score 
      • Square the new scores 
      • Add all of the squared scores together 
      • Divide the total squared score by the number of scores, minus 1
      • Work out the square root of the variance (using a calculator), then you have the standard deviation 
    • Advantages of using standard deviation
      • It provides information as to how the scores are distributed across a data set
      • It is more sensitive than the range as it uses all the scores in the data set 
    • Disadvantages of using standard deviation 
      • It can be time-consuming and complicated to calculate (although this is not such a huge consideration in the 21st century as there are multiple digital tools that will do this for the researcher)
      • It can be skewed by extreme outliers i.e. outliers may inflate the standard deviation, giving a misleading representation of the spread of values in the data set

You've read 0 of your 0 free revision notes

Get unlimited access

to absolutely everything:

  • Downloadable PDFs
  • Unlimited Revision Notes
  • Topic Questions
  • Past Papers
  • Model Answers
  • Videos (Maths and Science)

Join the 100,000+ Students that ❤️ Save My Exams

the (exam) results speak for themselves:

Did this page help you?

Claire Neeson

Author: Claire Neeson

Claire has been teaching for 34 years, in the UK and overseas. She has taught GCSE, A-level and IB Psychology which has been a lot of fun and extremely exhausting! Claire is now a freelance Psychology teacher and content creator, producing textbooks, revision notes and (hopefully) exciting and interactive teaching materials for use in the classroom and for exam prep. Her passion (apart from Psychology of course) is roller skating and when she is not working (or watching 'Coronation Street') she can be found busting some impressive moves on her local roller rink.