“Why Correlation Does Not Mean Causation — The Statistics Trap That Catches Everyone”
The A Level Statistics mistake that looks simple… until the exam asks you to explain it
One of the most common traps in A Level Statistics is not usually the calculation.
Many students can calculate a correlation coefficient. They can draw a scatter graph. They can add a line of best fit. They can say whether the correlation is positive, negative, strong, weak, or close to zero.
Then the exam question asks:
“What can you conclude?”
And that is where the danger begins.
Because statistics is not just about pressing buttons on a calculator. It is about interpreting evidence carefully.
The big idea is this:
Correlation shows that two variables are related. It does not prove that one variable causes the other.
That may sound simple, but it is one of the most important ideas in all of statistics.
The ice cream and drowning example
Here is the classic example.
In summer, ice cream sales increase.
In summer, the number of drowning incidents also tends to increase.
So, does eating ice cream cause drowning?
Of course not.
That would be a ridiculous conclusion. The real explanation is that both are affected by a third factor:
hot weather.
When the weather is hot, more people buy ice cream. Also, more people go swimming, visit beaches, go boating, or spend time near water. So the two variables may rise together, but that does not mean one causes the other.
This is the trap.
A graph may show a pattern.
The numbers may look convincing.
The correlation coefficient may be close to 1.
But that still does not prove causation.
What does correlation actually mean?
Correlation measures the strength and direction of a relationship between two variables.
For example, we might collect data on:
- height and shoe size
- hours of revision and test score
- temperature and ice cream sales
- engine size and fuel consumption
- age of a car and resale value
- time spent on a phone and hours of sleep
If the points on a scatter graph form a clear pattern, we may say there is correlation.
But correlation is only describing what appears to happen in the data.
It does not explain why it happens.
That is the important distinction.
Positive correlation
A positive correlation means that as one variable increases, the other variable tends to increase as well.
For example:
As revision time increases, test scores may tend to increase.
This seems sensible. A student who revises for longer may perform better.
But we still have to be careful.
It does not automatically prove that simply sitting with a textbook for more hours caused the better result. Other factors may be involved:
- the quality of the revision
- whether the student practised exam questions
- prior knowledge
- how well they slept
- whether they had support from a teacher or tutor
- how difficult the test was
- how anxious they felt on the day
So even a sensible correlation must be interpreted carefully.
A good A Level answer would not say:
“More revision causes higher marks.”
A better answer would say:
“There appears to be a positive correlation between revision time and marks, but this does not prove causation. Other factors such as revision quality, prior ability, sleep, or exam technique may also affect the result.”
That second answer is much more statistical.
Negative correlation
A negative correlation means that as one variable increases, the other variable tends to decrease.
For example:
As the age of a car increases, its resale value tends to decrease.
This is a negative correlation.
Older cars are often worth less. Again, that makes sense.
But even here, we have to avoid being too simplistic. Age may be important, but it is not the only factor. A car’s value may also depend on:
- mileage
- condition
- service history
- rarity
- fuel type
- brand
- demand in the second-hand market
So the age of the car may be strongly linked to value, but it may not be the whole explanation.
Statistics gives us evidence. It does not remove the need to think.
Zero or very weak correlation
Sometimes there is no clear pattern.
For example:
A student’s shoe size and their favourite music style.
There is no sensible reason to expect these to be connected. If we collected data, the scatter graph would probably look like a random cloud of points.
In an exam, students need to describe this carefully.
They might write:
“There appears to be little or no correlation between the two variables.”
This is better than saying:
“There is definitely no relationship.”
Why?
Because sample data is limited. We usually do not have perfect knowledge of an entire population. Statistics is often about making careful judgements from incomplete evidence.
The hidden third variable: the confounding variable
A confounding variable is an extra variable that may affect the relationship between the two variables being studied.
This is one of the main reasons why correlation does not prove causation.
In the ice cream example:
- Variable 1: ice cream sales
- Variable 2: drowning incidents
- Confounding variable: hot weather
Hot weather affects both.
This makes it look as though ice cream sales and drowning incidents are directly connected, when actually they are both responding to something else.
This idea appears again and again in real life.
Example 1: Social media use and anxiety
Suppose a study finds that students who spend more time on social media also report higher anxiety levels.
It would be tempting to say:
“Social media causes anxiety.”
But that may be too simple.
Possible explanations include:
- social media may increase anxiety
- anxious students may use social media more often
- poor sleep may increase both social media use and anxiety
- exam pressure may increase anxiety and phone use
- loneliness may lead to more online activity and more anxiety
The correlation may be real, but the cause is not automatically clear.
This is exactly the kind of example where A Level students need to use careful language.
Example 2: Coffee and productivity
Suppose office workers who drink more coffee complete more work.
Does coffee cause productivity?
Possibly. But other explanations exist.
Maybe:
- people with demanding jobs drink more coffee
- productive people work longer hours and therefore drink more coffee
- morning people drink coffee and are already more alert
- workplace culture affects both coffee drinking and output
Again, the correlation may be interesting, but it is not proof.
Example 3: Class size and exam results
Suppose schools with smaller classes achieve better exam results.
Does a smaller class size cause better results?
It may help, but we need to be cautious.
Other factors may include:
- school funding
- parental support
- prior attainment
- quality of teaching
- access to resources
- attendance
- student motivation
This is a good example for parents, because it shows how educational statistics can be misleading when used too casually.
A headline might say:
“Smaller classes improve results.”
But a statistician would ask:
“What else could explain the pattern?”
That question is at the heart of good statistical thinking.
Example 4: Exercise and life expectancy
Suppose people who exercise more tend to live longer.
This sounds reasonable, and there may well be a causal link. But even here, we still have to think.
People who exercise more may also:
- eat more healthily
- smoke less
- have higher income
- have better access to healthcare
- have more leisure time
- already be healthier to begin with
This does not mean exercise is unimportant. It means that a simple correlation alone is not enough to prove the full cause.
To prove causation properly, researchers need more careful study designs.
Why this matters in A Level Maths
In A Level Statistics, students are often asked to interpret data, not just calculate with it.
This may involve:
- scatter diagrams
- correlation coefficients
- regression lines
- hypothesis tests
- sampling
- large data sets
- real-world contexts
Students may lose marks because their conclusion is too strong.
For example, they may write:
“This proves that temperature causes ice cream sales to increase.”
Depending on the context, that may be too definite.
A safer and more statistically correct answer might be:
“The data suggests a positive association between temperature and ice cream sales. This may indicate that higher temperatures are linked with increased ice cream sales, but further evidence would be needed to establish causation.”
That is the difference between a casual answer and an A Level answer.
Exam language students should use
A Level examiners like careful, precise language.
Useful phrases include:
“There appears to be…”
“The data suggests…”
“There is evidence of an association between…”
“This does not necessarily imply causation…”
“A possible confounding variable is…”
“Other factors may have influenced the result…”
“Further investigation would be needed…”
These phrases are not just decoration. They show that the student understands what statistics can and cannot prove.
Phrases students should avoid
Students should be careful with words such as:
- proves
- definitely
- always
- causes
- guarantees
- must mean
For example, this is usually too strong:
“The graph proves that more screen time causes worse sleep.”
A better answer would be:
“The graph suggests a negative association between screen time and sleep duration. However, this does not prove that screen time causes reduced sleep, as other factors such as stress, workload, or bedtime routine may also be involved.”
This kind of answer shows maturity.
It also sounds like a statistician.
Correlation coefficient: what does it really tell us?
At A Level, students often calculate or interpret the product moment correlation coefficient, usually written as r.
The value of r lies between -1 and 1.
- r close to 1 means strong positive correlation
- r close to -1 means strong negative correlation
- r close to 0 means little or no linear correlation
But here is the key point:
Even if r is very close to 1 or -1, this still does not prove causation.
A strong correlation may be very useful. It may allow predictions. It may suggest an important relationship. But it does not, by itself, explain the cause.
This is where many students make mistakes.
They see a strong value of r and assume they are allowed to make a strong causal statement.
They are not.
Regression lines: useful, but dangerous if misunderstood
Regression lines are used to model the relationship between two variables.
For example, we might use a regression line to estimate a student’s test score based on revision time.
This can be useful, but students need to remember three important warnings.
1. The prediction is only an estimate
A regression line gives a model, not a guarantee.
A student who revises for 10 hours is not guaranteed to get a specific mark.
2. The model may not apply outside the data range
This is called extrapolation.
If the data only includes students who revised between 1 and 10 hours, it may be dangerous to use the model to predict the score for someone who revised for 50 hours.
The relationship may not continue in the same way.
3. The regression line does not prove causation
Even if the line fits the data well, it still only describes the relationship in the data.
It does not prove why the relationship exists.
A simple classroom example
Imagine a tutor collects data from ten students:
| Hours revised | Test score |
|---|---|
| 1 | 28 |
| 2 | 35 |
| 3 | 42 |
| 4 | 48 |
| 5 | 55 |
| 6 | 63 |
| 7 | 68 |
| 8 | 75 |
| 9 | 79 |
| 10 | 84 |
This shows a clear positive correlation.
It would be reasonable to say:
“Students who revised for longer tended to achieve higher test scores.”
But it would be too strong to say:
“Every extra hour of revision caused the mark to increase.”
Why?
Because not all revision is equal.
One student may spend three hours carefully working through past paper questions and correcting mistakes. Another may spend three hours copying notes while half-watching videos on their phone.
The time is the same. The quality is not.
This is one of the reasons why statistics needs interpretation.
What parents need to know
For parents supporting students through A Level Maths, this topic is worth understanding.
A Level Statistics is not just about arithmetic. It is about judgement.
Students need to learn how to:
- interpret graphs
- question conclusions
- identify missing information
- spot misleading claims
- understand uncertainty
- explain results clearly
This is also why statistics is so valuable beyond school.
Every day, we are surrounded by claims based on data:
- “This diet improves concentration.”
- “This school gets better results.”
- “This app improves productivity.”
- “This revision method doubles success.”
- “This lifestyle choice increases happiness.”
Some claims may be true. Some may be exaggerated. Some may confuse correlation with causation.
A student who understands statistics is better equipped to question the world intelligently.
Why students often find this difficult
Many students find this topic harder than expected because it feels less mechanical than pure maths.
In pure maths, a question may have a clear method:
Differentiate this.
Solve this equation.
Find this integral.
Prove this identity.
But statistics often asks:
What does this mean?
Is this conclusion justified?
What assumptions are being made?
What else could explain the result?
That requires a different kind of thinking.
Some students are good at calculations but weaker at written interpretation. Others understand the idea verbally but struggle to phrase it in exam language.
That is why this topic is ideal for one-to-one teaching. A tutor can help the student move from:
“I know what I mean…”
to:
“I can write it clearly enough to get the mark.”
How to answer exam questions on correlation
Here is a simple structure students can use.
Step 1: Describe the relationship
Say whether the correlation is positive, negative, strong, weak, or absent.
Example:
“There appears to be a strong positive correlation between hours revised and test score.”
Step 2: Avoid claiming proof
Do not say it proves one variable causes the other.
Example:
“However, this does not prove that revision time alone caused the higher scores.”
Step 3: Suggest another factor
Identify a possible confounding variable.
Example:
“Other factors such as revision quality, prior ability, teaching support, sleep, or exam technique may also affect performance.”
Step 4: Use cautious language
Use “suggests”, “appears”, “is associated with”, or “may”.
Example:
“The data suggests an association, but further evidence would be needed before making a causal conclusion.”
This four-step structure can turn a vague answer into a strong statistical explanation.
Worked example: screen time and sleep
Suppose an exam question says:
A group of students recorded their daily screen time and the number of hours they slept. The data showed a negative correlation.
A weak answer might be:
“Screen time causes students to sleep less.”
A better answer:
“There appears to be a negative correlation between screen time and hours of sleep, meaning that students with higher screen time tended to sleep for fewer hours. However, this does not prove that screen time caused the reduction in sleep. Other factors such as stress, homework, caffeine intake, bedtime routine, or general lifestyle may also affect sleep.”
This answer is much stronger because it describes the pattern and avoids overclaiming.
Worked example: exercise and exam performance
Suppose a study finds that students who exercise more also tend to get higher exam results.
A poor answer:
“Exercise improves exam results.”
A better answer:
“The data suggests a positive association between exercise and exam results. However, this does not necessarily mean that exercise caused the higher results. Students who exercise regularly may also have better routines, sleep patterns, motivation, or general wellbeing, which could also influence exam performance.”
This is the sort of answer that shows real understanding.
A useful phrase to remember
A simple phrase students can remember is:
“Correlation describes a pattern. Causation explains a reason.”
That is the heart of the topic.
Correlation tells us that two things appear to move together.
Causation says that one thing directly affects the other.
Those are not the same.
Why statistics can mislead us
Statistics is powerful, but it can also be misused.
Sometimes this happens by accident. Sometimes people simply do not understand the limitations of the data.
But sometimes statistics is used deliberately to make a claim sound stronger than it really is.
A graph can be persuasive.
A percentage can sound impressive.
A correlation can look scientific.
But a good student asks:
- What data was collected?
- How large was the sample?
- Was the sample representative?
- Are there outliers?
- Is the relationship linear?
- Could there be a confounding variable?
- Does the evidence actually support the conclusion?
This is why A Level Statistics is such an important part of Maths.
It teaches students not just how to calculate, but how to think.
How private tuition can help with this topic
In lessons, this is a topic where discussion is just as important as calculation.
A student may be able to find the value of r, but still not know what to write afterwards.
That is where guided practice helps.
A useful lesson might include:
- interpreting several scatter graphs
- comparing strong and weak correlations
- writing exam-style conclusions
- identifying confounding variables
- correcting badly worded answers
- discussing real examples from education, health, business, and psychology
- practising calculator techniques for correlation and regression
The aim is not just to get the answer.
The aim is to know what the answer means.
That is often the difference between a student who can “do the method” and a student who can gain the final interpretation marks.
Conclusion: the graph is only the beginning
Correlation is one of the most useful ideas in statistics, but it is also one of the easiest to misuse.
A scatter graph may show a relationship.
A correlation coefficient may measure that relationship.
A regression line may help us make predictions.
But none of these automatically proves cause and effect.
That is the lesson students need to remember.
Correlation can suggest a connection. It can guide further investigation. It can help us make predictions. But it cannot, by itself, prove causation.
So the next time you see two variables rising together, remember the ice cream example.
Ice cream sales rise.
Drowning incidents rise.
But ice cream is not pushing anyone into the sea.
Statistics is not just about numbers.
It is about learning to be careful with conclusions.
And that is exactly why it matters.
