On Monday 23 September 2019, the RSS Social Statistics Section held a meeting on learning more from the practice of educational trials, looking at methodological developments.
Zhi Min Xiao of the University of Exeter gave a talk titled 'The case against perfection in the mean: From average to individualised treatment effect in randomised controlled trials for education.
He started proceedings by introducing the problems of subgroup analysis; a common practice in education trials just as in clinical trials. He compared the results of interaction models to separate subgroup analyses for children eligible for free school meals (FSM) versus non-FSM children across several education RCTs, concluding that conventional interaction tests often produce self-contradictory results.
Zhi Min illustrated this proposal with an example from a trial to promote physical activity in children which showed quite different distributions of change for girls and boys.
To address this alongside the problem of many education trials returning small effects, he went on to propose an individualised approach to treatment effect variation called the Pupil Advantage Index (PAI). This approach uses both statistical and machine learning techniques to predict outcomes for individuals using information from more than one trial on the same topic and has applications in designing future trials in addition to data interpretation.
Second speaker Ben Weidmann, from Harvard University & Education Endowment Foundation, gave a talk titled 'Lurking Inferential Monsters? Quantifying selection bias in non-experimental evaluations of school programs'.
After highlighting the pros and cons of randomised experiments in comparison with observational studies, Ben Weidmann went on to describe a study where he took 14 randomised control groups from education trials and compared their educational outcomes to matched comparison groups of pupils from the National Pupil Database (an administrative database of pupils in English maintained schools). He did this as a potential measure of bias had studies been evaluated in the latter way. Lalonde (1986) had previously claimed that selection bias in observational studies within the field of education was lower than in other areas. However, as was later clarified in discussion, the widespread use of randomised designs in both the US and, more recently, England, is based largely on the premise that selection bias is a problem for education research.
Ben presented two measures of bias. The naïve measure, which did not attempt any form of matching or conditioning; and the matched measure, which used an established method of generating a matched comparison group of pupils. Whilst the naïve measure indicated substantial selection bias, and was more indicative of differences between the RCT sample and the population, the matched measure of bias had a narrow distribution around zero.
He cautioned against abandoning randomised evaluations for observational studies since it may be more difficult to generate unbiased comparison groups without randomisation for certain types of intervention. For ‘non-radical’ interventions, however, he said there may be more scope for observational studies than previously thought.
Harvey Goldstein, of the University of Bristol, chaired a discussion which considered whether process evaluation could offer more information to plan these analyses.
Thomas King of the Social Statistics Section committee presented the case for an RSS Working Party on RCTs in education that might support methodological enquiry and advise on best practice from a trial design and analysis perspective. This was largely supported by delegates in attendance.