Skip to main content


The case against perfection in the mean: Why it is time for an individualised approach to evidence for education

Presented by Dr ZhiMin Xiao, Graduate School of Education (UoE)

Analyses of educational interventions need to produce evidence that is relevant to specific groups of students. When a group is not the target population of an intervention, any analysis involving just that group is called subgroup analysis, which is often regarded as a statistical malpractice, as its findings are often underpowered, unreliable, prone to overinterpretation at best, or misleading at worst.

Event details

Meanwhile, researchers would be criticised for generating irrelevant evidence and accused of wasting research money if they do not conduct relevant subgroup analysis.

In this study, we estimated intervention effects for Free School Meal (FSM) pupils in English schools, which is a pre-specified subgroup in most educational trials funded by the Education Endowment Foundation (EEF) in England. Specifically, by following some established approaches to subgroup analysis in trial data analysis, we first ran a treatment-FSM interaction test for each and every outcome (84 in total) from different but independent trials to see if there is any statistically significant difference in effects between FSM and Non-FSM students. We then took an extra step to calculate separate effect sizes within the two subgroups defined by the binary FSM variable in 48 RCTs. Finally, we examined the p-values derived from the interaction tests and compared the overall effect sizes for both FSM and Non-FSM pupils with the two separate subgroup estimates in each and every study. We found that conventional interaction tests often produce self-contradictory results, which renders the effects estimated problematic and urges us to pursue an alternative to subgroup analysis.

We, therefore, propose an individualised approach to treatment effect estimation, which employs both statistical and machine learning techniques to predict the differences in factual and counterfactual outcomes for individuals who could have been allocated to different intervention arms. The predicted differences, or Individualised Treatment Effects (ITE), can help us assess how many individuals benefited from a past intervention by how much. But they can also be used as what we call Pupil Advantage Index (PAI) to assist in decision-making or planning for policy interventions yet to be implemented, as they predict how much an individual is likely to benefit from a proposed intervention under contrasting scenarios, given what we already know about the individual and an existing body of evidence, ideally from similar trials, on the same topic.

We tested the approach described above on real RCT data from trials of different designs and scales. We demonstrate that conventional regression models can sometimes outperform machine learning algorithms in terms of prediction accuracy, but they can be too deterministic to be specific if they are not carefully pre-specified in the first place. However, machine learning techniques like Random Forests are more responsive to individual differences. More importantly, the individualised approach provides ample room for human knowledge from experts and practitioners to play significant roles in designing trials and interpreting data for better educational outcomes. We argue that it is time to focus on individuals and answer “what if” questions using prediction seen in statistical learning.


Booking not necessary, however please arrive early as seating is limited to 30.