On Wednesday 8 May 2019, the RSS Medical Section and Leeds/Bradford local group jointly hosted a meeting at the University of Leeds entitled ‘Solutions in Causal Inference’, with a focus on observational data analysis. Speakers included members of the ‘Causal inference in real world data’ group from Leeds Institute for Data Analytics at the University of Leeds, and a visiting keynote speaker from Radboud University Medical Center in the Netherlands.
Kellyn Arnold, a PhD student at the University of Leeds, started proceedings with an overview of some of the basic principles of causal inference entitled ‘Do we fully understand the challenges of introducing machine learning into health research? Lessons from our (poor) understanding of linear modelling’. Machine learning methods are becoming more prominent and highly used in health research but the focus is commonly on prediction rather than causal inference. Directed acyclic graphs (DAGs) are non-parametric causal models which represent assumed causal structures that give rise to statistical associations, but ‘association does not equal causation’. Prediction is concerned with estimating the likely value (or risk) of an outcome given information from one or more observed factors, while causal inference is concerned with estimating the likely change in the value of an outcome due to change in a particular factor. Modern causal inference methods should be integrated with machine learning approaches to harness the full power of these methods.
Peter Tennant, academic fellow in health data science at the University of Leeds, gave the second presentation, titled ‘Analyses of change: A causal inference perspective’. Causal diagrams help to identify bias and assumptions, and encourage more thoughtful and transparent research. Identification and estimation are separated by first defining the estimand (what you seek) before building the estimator (how you will get there) and then seeking the best estimate of the estimand. ‘Change-scores’ are composite variables constructed from repeated measures of a single parent variable; their analysis conflates the information and causal pathways of the baseline and follow-up parent variables leading to inferential bias and contradictory results. Future observational studies should not conduct analyses of change scores, and existing studies that have conducted change-scores analyses should be viewed with caution. Lord’s paradox has also been explained!
The keynote presentation was given by Johannes Textor, assistant professor in tumor immunology at Radboud University Medical Center in the Netherlands, on ‘Making DAG-based causal inference more quantitative’, or ‘Test your DAGs!’ There are simple questions machine learning cannot answer based solely on data - for example, Simpson’s paradox demonstrates a reversal effect dependent on model adjustment. Causal structures must be considered and DAGs can be used to resolve the paradox. DAGs should be tested, falsified and refined. There are many types of testable implications, particularly conditional independence (using d-separation). Examples are given to test a collider or a fork model. Strategies may be regression or stratification. A more realistic example uses the ‘Adult census income’ dataset. New ways of model testing are still being discovered, eg instrumentality tests for models using instrumental variables.
A recording of the session is available on the RSS Medical Section YouTube Channel: https://t.co/hKJkKfxGuZ