In this section

Discussion meetings

Discussion meetings are events where articles ('papers for reading') appearing in the Journal of the RSS are presented and discussed. The discussion and authors' replies are then published in the relevant Journal series.

Read more about our discussion meetings, including guidelines for papers for discussion.If you would like to make a written contribution to a discussion meeting, see our guidelines. For early invitations to future meetings, contact the journals office to join our mailing list.

Next Discussion Meeting

'Regression by composition' by Farewell et al.
Imperial College London, and online.

Huxley Building (Clore Lecture Theatre, Room 213)
180 Queen’s Gate,
South Kensington Campus
London SW7 2AZ

Tuesday, 24 March 2026
Time: 4pm to 6pm (UK time)
Introductory DeMO 2:15pm to 3:15pm

Register for the introductory DeMO - In person
Register for the introductory DeMO - Online
Register for the Discussion Meeting- In person
Register for the Discussion Meeting - Online

Paper: ‘Regression by composition’ by Farewell et al.
Authors: Daniel Farewell, Cardiff University,UK, Rhian Daniel, Cardiff University, UK, Mats Stensrud, Ecole Polytechnique Federale de Lausanne (EPFL), Switzerland and Anders Huitfeldt, Oslo University Hospital, Norway

Download the preprint.

Abstract: We describe a modular regression framework in which covariate-dependent transformations are composed together and act on probability distributions. This framework is based on group actions of vector spaces, which are computationally convenient families of transformations that are well suited to model building and inference. We quantify covariate contributions to each group action through corresponding linear maps, and these are the only model parameters to be estimated. Algebraic features of group actions—notably, their invariant subsets—are informative about local statistical properties of the regression model. Vector space actions on affine spaces also provide a minimal geometric structure for comparing distributions, with affine transformations characterizing collapsible contrasts. In two substantive data analyses, we illustrate how unconventional models may be expressed as regressions by composition. We exhibit and extend existing nonlinear models for interpolating infant growth curves for individuals, and for producing standard population growth charts. We also use regression by composition to specify and fit a bespoke, mechanistically-motivated binary regression model for antiretroviral therapies in the treatment of HIV

The paper will be published in Journal of the Royal Statistical Society Series B: Statistical Methodology | Oxford Academic (oup.com)

Past Discussion Meetings

‘Balanced and Robust Randomized Treatment Assignments: The Finite Selection Model for the Health Insurance Experiment and Beyond’
Took place at the Hallam Conference Centre and online.

Hallam Conference Centre,
44 Hallam St, London W1W 6JJ

Wednesday, 22 October 2025
Time: 3pm to 5pm (UK time)

Paper: ‘Balanced and Robust Randomized Treatment Assignments: The Finite Selection Model for the Health Insurance Experiment and Beyond’
Authors: Ambarish Chattopadhyay, Stanford University, US, Carl N Morris (dec), Harvard University, USA, Jose R Zubizarreta, Harvard University, USA

Download the preprint.
Supplementary material.

Abstract: The Finite Selection Model (FSM) was developed by Carl Morris in the 1970s for the design of the RAND Health Insurance Experiment (HIE) (Morris 1979, Newhouse et al. 1993), one of the largest and most comprehensive social science experiments conducted in the U.S. The idea behind the FSM is that each treatment group takes turns selecting units in a fair and random order to optimize a common assignment criterion. At each of its turns, a treatment group selects the available unit that maximally improves the combined quality of its resulting group of units in terms of the criterion. In the HIE and beyond, we revisit, formalize, and extend the FSM as a general tool for experimental design. Leveraging the idea of D-optimality, we propose and analyze a new selection criterion in the FSM. The FSM using the D-optimal selection function has no tuning parameters for covariate balance, is affine invariant, and when appropriate, retrieves several classical designs such as randomized block and matched-pair designs. For multi-arm experiments, we propose algorithms to generate a fair and random selection order of treatments. We demonstrate FSM’s performance in a case study based on the HIE and in ten randomized studies from the health and social sciences. On average, the FSM achieves 68% better covariate balance than complete randomization and 56% better covariate balance than rerandomization in a typical study. We recommend the FSM be considered in experimental design for its conceptual simplicity, efficiency, and robustness.

The paper will be published in Journal of the Royal Statistical Society Series A: Statistics in Society | Oxford Academic (oup.com)'

'Statistical exploration of the Manifold Hypothesis'
Took place Online.

Wednesday 08 October 2025
Time: 4pm to 6pm (UK time)

Paper: ‘Statistical exploration of the Manifold Hypothesis’
Authors: Nick Whiteley, University of Bristol, UK, Annie Gray, University of Bristol, UK, Patrick Rubin-Delanchy, University of Edinburgh, UK.

Download the preprint.

Abstract: The Manifold Hypothesis is a widely accepted tenet of Machine Learning which asserts that nominally high-dimensional data are in fact concentrated near a low-dimensional manifold, embedded in high-dimensional space. This phenomenon is observed empirically in many real world situations, has led to development of a wide range of statistical methods in the last few decades, and has been suggested as a key factor in the success of modern AI technologies. We show that rich and sometimes intricate manifold structure in data can emerge from a generic and remarkably simple statistical model — the Latent Metric Model — via elementary concepts such as latent variables, correlation and stationarity. This establishes a general statistical explanation for why the Manifold Hypothesis seems to hold in so many situations. Informed by the Latent Metric Model we derive procedures to discover and interpret the geometry of high-dimensional data, and explore hypotheses about the data generating mechanism. These procedures operate under minimal assumptions and make use of well-known graph-analytic algorithms.

The paper will be published in Journal of the Royal Statistical Society Series B: Statistical Methodology | Oxford Academic (oup.com)

'Augmented balancing weights as linear regression’
Took place at the 2025 Conference in Edinburgh.

Edinburgh International Conference Centre.
150 Morrison St,
Edinburgh EH3 8EE

Tuesday, 2 September 2025
Time: 16:50 to 18:30 (UK time)

Paper: ‘Augmented balancing weights as linear regression’
Authors: David Bruns-Smith, UC Berkeley, USA, Oliver Dukes, Ghent University, Belgium, A V Feller, UC Berkeley, USA, Elizabeth L Ogburn, John Hopkins University, USA.

Download the preprint.

Abstract: We provide a novel characterization of augmented balancing weights, also known as automatic de-biased machine learning (AutoDML). These popular doubly robust or de-biased machine learning estimators combine outcome modeling with balancing weights — weights that achieve covariate balance directly in lieu of estimating and inverting the propensity score.

When the outcome and weighting models are both linear in some (possibly infinite) basis, we show that the augmented estimator is equivalent to a single linear model with coefficients that combine the coefficients from the original outcome model coefficients and coefficients from an unpenalized ordinary least squares (OLS) fit on the same data. We see that, under certain choices of regularization parameters, the augmented estimator often collapses to the OLS estimator alone; this occurs for example in a re-analysis of the LaLonde (1986) dataset. We then extend these results to specific choices of outcome and weighting models.

We first show that the augmented estimator that uses (kernel) ridge regression for both outcome and weighting models is equivalent to a single, undersmoothed (kernel) ridge regression. This holds numerically in finite samples and lays the groundwork for a novel analysis of undersmoothing and asymptotic rates of convergence. When the weighting model is instead lasso-penalized regression, we give closed-form expressions for special cases and demonstrate a “double selection” property.

Our framework opens the black box on this increasingly popular class of estimators, bridges the gap between existing results on the semiparametric efficiency of undersmoothed and doubly robust estimators, and provides new insights into the performance of augmented balancing weights.

'New tools for network time series with an application to COVID-19 hospitalisations'
Took place at Imperial College London, and online.

Imperial College London
Huxley Building (Lecture Theatre 130)
180 Queen’s Gate,
South Kensington Campus
London SW7 2AZ

Tuesday, 10 June 2025
Time: 3pm to 5pm (UK time)

Paper: 'New tools for network time series with an application to COVID-19 hospitalisations'
Authors: Guy Nason, Imperial College London, Daniel Salnikov, Imperial College London, Mario Cortina-Borja, University College London, Institute of Child Health.

Download the preprint
Supplementary material

Abstract: Network time series models are increasingly important across many areas, involving known or inferred underlying network structure, which can be exploited to make sense of high-dimensional dynamic phenomena.

We introduce two new association measures: the network and partial network autocorrelation functions and define Corbit (correlation--orbit) visualisation plots. Corbit plots permit interpretation of underlying correlation structures and, crucially, aid model selection more rapidly than general tools such as typical information criteria.
We introduce interpretations of generalised network autoregressive (GNAR) processes as generalised graphical models. We shine new light on how incorporating prior information is related to variable selection and shrinkage in the GNAR context.

We illustrate the usefulness of GNAR models, network autocorrelations and Corbit plots for a novel network time series modelling of COVID--19 mechanical ventilation bed occupancies at 140 NHS Trusts.

We also introduce the R--Corbit plot that shows correlations over different time periods or with respect to external covariates and plots that quantify the relevance and influence of individual nodes. Our analysis provides insight on the COVID--19 series’ underlying dynamics, highlights two groups of geographically co--located `relevant' NHS Trusts, and demonstrates excellent predictive performance.

The paper will be published in Journal of the Royal Statistical Society Series A: Statistics in Society | Oxford Academic (oup.com)'

'Some statistical aspects of the Covid-19 response'
Took place at Hallam Conference Centre and online

Hallam Conference Centre,
44 Hallam St, London W1W 6JJ

Thursday, 10 April 2025
Time: 3pm to 5pm (UK time)

Paper: 'Some statistical aspects of the Covid-19 response'
Authors: Simon N. Wood, School of Mathematics, University of Edinburgh, UK, Ernst C. Wit, Institute of Computing, Università della Svizzera Italiana, Lugano, Switzerland, Paul M. McKeigue, College of Medicine and Veterinary Medicine, University of Edinburgh, U.K, Danshu Hu, Beth Flood, Lauren Corcoran and Thea Abou Jawad, School of Mathematics, University of Edinburgh, UK.

Download the preprint
Supplementary R data package

Abstract: This paper discusses some statistical aspects of the U.K. Covid-19 pandemic response, focusing particularly on cases where we believe that a statistically questionable approach or presentation has had a substantial impact on public perception, or government policy, or both. We discuss the presentation of statistics relating to Covid risk, and the risk of the response measures, arguing that biases tended to operate in opposite directions, overplaying Covid risk and underplaying the response risks. We also discuss some issues around presentation of life loss data, excess deaths and the use of case data. The consequences of neglect of most individual variability from epidemic models, alongside the consequences of some other statistically important omissions are also covered. Finally the evidence for full stay at home lockdowns having been necessary to reverse waves of infection is examined, with new analyses provided for a number of European countries.

'Methods for Estimating the Exposure-Response Curve to Inform the New Safety Standards for Fine Particulate Matter'
Took place at the Imperial College London and online

Imperial College London
Huxley Building (Lecture Theatre 130)
180 Queen’s Gate,
South Kensington Campus
London SW7 2AZ

Thursday, 12 December 2024
Time: 2pm to 4pm

Paper: 'Methods for Estimating the Exposure-Response Curve to Inform the New Safety Standards for Fine Particulate Matter'
Authors: Michael Cork, Harvard University, USA (presenting), Daniel Mork, Francesca Dominici Harvard University, USA (co-authors)

Download the preprint

Abstract: Exposure to fine particulate matter (PM2.5) poses significant health risks and accurately determining the shape of the relationship between PM2.5 and health outcomes has crucial policy implications. Although various statistical methods exist to estimate this exposure-response curve (ERC), few studies have compared their performance under plausible data-generating scenarios.

This study compares seven commonly used ERC estimators across 72 exposure-response and confounding scenarios via simulation. Additionally, we apply these methods to estimate the ERC between long-term PM2.5 exposure and all-cause mortality using data from over 68 million Medicare beneficiaries in the United States. Our simulation indicates that regression methods not placed within a causal inference framework are unsuitable when anticipating heterogeneous exposure effects. Under the setting of a large sample size and unknown ERC functional form, we recommend utilizing causal inference methods that allow for nonlinear ERCs.

In our data application, we observe a nonlinear relationship between annual average PM2.5 and all-cause mortality in the Medicare population, with a sharp increase in relative mortality at low PM2.5 concentrations. Our findings suggest that stricter limits on PM2.5 could avert numerous premature deaths. To facilitate the utilization of our results, we provide publicly available, reproducible code on Github for every step of the analysis.

The paper will be published in Journal of the Royal Statistical Society Series A: Statistics in Society | Oxford Academic (oup.com)

'Analysis of citizen science data' (multi-paper Discussion meeting)
The event took place at the annual conference in Brighton on 3 September, 4:45pm to 6:45pm.

The event was chaired by our President, Andrew Garrett.

Paper 1: 'Efficient statistical inference methods for assessing changes in species'
Authors: Emily B Dennis12, Alex Diana3, Eleni Matechou2, Byron J T Morgan2
(1Butterfly Conservation, 2University of Kent, 3University of Essex)

Download the preprint
Supplementary materials

Abstract: The global decline of biodiversity, driven by habitat degradation and climate breakdown, is a significant concern. Accurate measures of change are crucial to provide reliable evidence of species’ population changes. Meanwhile citizen science data have witnessed a remarkable expansion in both quantity and sources and serve as the foundation for assessing species’ status. The growing data reservoir presents opportunities for novel and improved inference but often comes with computational costs: computational efficiency is paramount, especially as regular analysis updates are necessary. Building upon recent research, we present illustrations of computationally efficient methods for fitting new models, applied to three major citizen science data sets for butterflies. We extend a method for modelling abundance changes of seasonal organisms, firstly to accommodate multiple years of count data efficiently, and secondly for application to counts from a snapshot mass-participation survey. We also present a variational inference approach for fitting occupancy models efficiently to opportunistic citizen science data. The continuous growth of citizen science data offers unprecedented opportunities to enhance our understanding of how species respond to anthropogenic pressures. Efficient techniques in fitting new models are vital for accurately assessing species’ status, supporting policy-making, setting measurable targets, and enabling effective conservation efforts.

Paper 2: 'Frequentist Prediction Sets for Species Abundance using Indirect Information'
Authors: Elizabeth Bersson and Peter D Hoff, Duke University, Durham, USA

Download the preprint

Abstract: Citizen science databases that consist of volunteer-led sampling efforts of species communities are relied on as essential sources of data in ecology. Summarising such data across counties with frequentist-valid prediction sets for each county provides an interpretable comparison across counties of varying size or composition. As citizen science data often feature unequal sampling efforts across a spatial domain, prediction sets constructed with indirect methods that share information across counties may be used to improve precision. In this article, we present a nonparametric framework to obtain precise prediction sets for a multinomial random sample based on indirect information that maintain frequentist coverage guarantees for each county. We detail a simple algorithm to obtain prediction sets for each county using indirect information where the computation time does not depend on the sample size and scales nicely with the number of species considered. The indirect information may be estimated by a proposed empirical Bayes procedure based on information from auxiliary data. Our approach makes inference for under-sampled counties more precise, while maintaining area-specific frequentist validity for each county. Our method is used to provide a useful description of avian species abundance in North Carolina, USA based on citizen science data from the eBird database.

Paper3: 'Extreme-value modelling of migratory bird arrival dates: Insights from citizen science data'
Authors: Jonathan Koh, University of Bern, Switzerland and Thomas Opitz, INRAE, France

Download the preprint

Abstract: Citizen science mobilises many observers and gathers huge datasets but often without strict sampling protocols, resulting in observation biases due to heterogeneous sampling effort, which can lead to biased statistical inferences. We develop a spatiotemporal Bayesian hierarchical model for bias-corrected estimation of arrival dates of the first migratory bird individuals at a breeding site. Higher sampling effort could be correlated with earlier observed dates. We implement data fusion of two citizen-science datasets with fundamentally different protocols (BBS, eBird) and map posterior distributions of the latent process, which contains four spatial components with Gaussian process priors: species niche; sampling effort; position and scale parameters of annual first arrival date. The data layer includes four response variables: counts of observed eBird locations (Poisson); presence-absence at observed eBird locations (Binomial); BBS occurrence counts (Poisson); first arrival dates (Generalised Extreme-Value). We devise a Markov Chain Monte Carlo scheme and check by simulation that the latent process components are identifiable. We apply our model to several migratory bird species in the northeastern United States for 2001–2021 and find that the sampling effort significantly modulates the observed first arrival date. We exploit this relationship to effectively bias-correct predictions of the true first arrivals.

The papers will be published in Journal of the Royal Statistical Society Series A: Statistics in Society | Oxford Academic (oup.com)

'Inference for extreme spatial temperature events in a changing climate with application to Ireland'
Took place on Monday, 3 June 2024, 3-4pm, online

Paper: 'Inference for extreme spatial temperature events in a changing climate with application to Ireland'.
Download the preprint

Authors: Dáire Healy, Jonathan Tawn, Peter Thorne and Andrew Parnell.

Abstract:
We investigate the changing nature of the frequency, magnitude, and spatial extent of extreme temperatures in Ireland from 1942 to 2020. We develop an extreme value model that captures spatial and temporal non-stationarity in extreme daily maximum temperature data. We model the tails of the marginal variables using the generalised Pareto distribution and the spatial dependence of extreme events by a semi-parametric Brown-Resnick r-Pareto process, with parameters of each model allowed to change over time. We use weather station observations for modelling extreme events since data from climate models (not conditioned on observational data) can over-smooth these events and have trends determined by the specific climate model configuration. However, climate models do provide valuable information about the detailed physiography over Ireland and the associated climate response. We propose novel methods which exploit the climate model data to overcome issues linked to the sparse and biased sampling of the observations. Our analysis identifies a temporal change in the marginal behaviour of extreme temperature events over the study domain, which is much larger than the change in mean temperature levels over this time window. We illustrate how these characteristics result in increased spatial coverage of the events that exceed critical temperatures.

The paper will be published in Journal of the Royal Statistical Society Series C: ScholarOne Manuscripts (manuscriptcentral.com)

'Independent Review of the UK Statistics Authority' by Denise Lievesley
Took place at the RSS building (Errol St., London) and online
Wednesday, 22 May, 2024
Time: 4pm

Author: Denise Lievesley
Chair: Andrew Garrett, RSS President

Paper: The discussion will be based on the published review and the government's response to it: Independent Review of the UK Statistics Authority 2023 - 2024 - GOV.UK (www.gov.uk)

Safe Testing
Wednesday, 24 January 2024, 4-6pm
Took place at the RSS building (Errol St., London) and online

Paper: 'Safe Testing'

Authors: Peter Grünwald, CWI and Leiden University, Netherlands, Rianne de Heide, Vrije Universiteit Amsterdam, Netherlands, Wouter Koolen, CWI and University of Twente, Netherlands.
Download the preprint

Abstract
We develop the theory of hypothesis testing based on the e-value, a notion of evidence that, unlike the p-value, allows for effortlessly combining results from several studies in the common scenario where the decision to perform a new study may depend on previous outcomes. Tests based on e-values are safe, i.e. they preserve Type-I error guarantees, under such optional continuation. We define growth rate optimality (GRO) as an analogue of power in an optional continuation context, and we show how to construct GRO e-variables for general testing problems with composite null and alternative, emphasizing models with nuisance parameters. GRO e-values take the form of Bayes factors with special priors. We illustrate the theory using several classic examples including a one-sample safe t-test and the 2 × 2 contingency table. Sharing Fisherian, Neymanian and Jeffreys-Bayesian interpretations, e-values may provide a methodology acceptable to adherents of all three schools.

The paper will be published in Journal of the Royal Statistical Society Series B: Statistical Methodology | Oxford Academic (oup.com)

View our playlist of recent Discussion Meetings
Read past Discussion Papers

Introducing the RSS

Watch our video

Who we are and what we do

View our 'About' section

Consultants Directory

Find a consultant

President and staff

Meet our president and staff

Discussion meetings