On the 21 March the Emerging Applications section of the RSS hosted a day long meeting titled ‘Data integration and modelling in observational studies’. This was sponsored by the MRC Methodology funded grant MR/M025195/1 'A general framework to adjust for missing confounders in observational studies'. The event was attended by around 80 people; the main question for the day was related to how to deal with confounders in order to estimate direct effects of exposures/risk factors in observational studies.
The day started with Dr Monica Pirani, from Imperial College, who worked as a research associate on the project and provided an overview of the major methodological advances and results. She described how integrating data from different sources can help overcome the issue of residual confounding and showed how the propensity score can be generalised to small area ecological studies and can provide a useful tool to summarise large numbers of confounders and to deal with data sparsity.
The rest of the day was divided into three sessions, focusing on data integration, missing data modelling and causal inference.
On data integration Professor Lance Waller from Emory University (US) discussed how to move beyond the typical disease mapping model, allowing for uncertainty on the population count (typically considered a fixed offset); he showed how data sources such as the American Community Survey can be useful in this respect. He also raised issues related to confidentiality in small area studies and discussed how methods borrowed from computer science, such as differential privacy, could potentially be applied in this context. The next talk saw Dr Robert Goudie from the MRC-BSU unit presenting some recent work on Markov melding, as a technique to provide joint inference of different models characterised by some common parameters. He also showed how a joint model could be split in sub-models which are computationally more tractable and efficient.
After lunch the discussion moved to missing data: Dr Karla Diaz-Ordaz, from LHSTM described the different methods to handle missing data in electronic health records, focusing on settings where the analysis of interest uses a propensity score for the estimation of causal treatment effects; she clarified the necessary assumptions and highlighted the differences with the methods and assumptions used with other types of study design and analyses. Next, Dr Cosetta Minelli, from Imperial College presented her recent work on Bayesian imputation and analysis through hierarchical modelling and showed how this outperforms other commonly used multiple imputation techniques in the presence of heterogeneity in the data. She also showcased a R shiny webapp which will allow this complex method to be used by epidemiologists and public health researchers.
The final session brought the audience back to the original question about the estimate of direct and causal effect and focused on the design stage rather than on the analysis stage. Professor Bianca De Stavola, from UCL described how ideas from trials can be adapted to fit observational studies and focused in particular on target trial emulations. Dr Sara Geneletti, from LSE gave an overview of the principles behind regression discontinuity design and interrupted time series as two methods to consider natural (quasi-experimental) studies in order to evaluate policies.
Concluding remarks were provided by Professor Sylvia Richardson from the MRC-BSU; she stressed that as the availability of data from disparate sources increase, observational studies should be exploited to answer causal questions. There are obvious challenges when using these data, but over the years the statistical community have provided robust methods to deal with these. She also commented on the importance of providing script, R packages, apps to make the research reproducible and to allow researchers from different background to be able to use these complex methods for their research.