Discussion meetings


Discussion meetings are events where articles ('papers for reading') appearing in the Journal of the RSS are presented and discussed. The discussion and authors' replies are then published in the relevant Journal series. 

Read more about our discussion meetings, including guidelines for papers for discussion.

Contact Judith Shorten if you would like to make a written contribution to a discussion meeting or join our mailing list for an early invitation to future meetings.

Next Discussion Meeting

'Methods for Estimating the Exposure-Response Curve to Inform the New Safety Standards for Fine Particulate Matter'
Will take place at the Imperial College London and online

Imperial College London
Huxley Building (Lecture Theatre 130)
180 Queen’s Gate,
South Kensington Campus
London SW7 2AZ

Thursday, 12 December 2024
Time: 2pm to 4pm

Register in person
Register online
 
Paper: 'Methods for Estimating the Exposure-Response Curve to Inform the New Safety Standards for Fine Particulate Matter'
Authors: Michael Cork, Harvard University, USA (presenting), Daniel Mork, Francesca Dominici Harvard University, USA (co-authors)

Download the preprint
 
Abstract: Exposure to fine particulate matter (PM2.5) poses significant health risks and accurately determining the shape of the relationship between PM2.5 and health outcomes has crucial policy implications. Although various statistical methods exist to estimate this exposure-response curve (ERC), few studies have compared their performance under plausible data-generating scenarios.

This study compares seven commonly used ERC estimators across 72 exposure-response and confounding scenarios via simulation. Additionally, we apply these methods to estimate the ERC between long-term PM2.5 exposure and all-cause mortality using data from over 68 million Medicare beneficiaries in the United States. Our simulation indicates that regression methods not placed within a causal inference framework are unsuitable when anticipating heterogeneous exposure effects. Under the setting of a large sample size and unknown ERC functional form, we recommend utilizing causal inference methods that allow for nonlinear ERCs.

In our data application, we observe a nonlinear relationship between annual average PM2.5 and all-cause mortality in the Medicare population, with a sharp increase in relative mortality at low PM2.5 concentrations. Our findings suggest that stricter limits on PM2.5 could avert numerous premature deaths. To facilitate the utilization of our results, we provide publicly available, reproducible code on Github for every step of the analysis.

The paper will be published in Journal of the Royal Statistical Society Series A: Statistics in Society | Oxford Academic (oup.com)


Past Discussion Meetings

'Analysis of citizen science data' (multi-paper Discussion meeting)
The Royal Statistical Society is pleased to invite you to the discussion of three papers at the annual conference in Brighton on 3 September, 4:45pm to 6:45pm. It is free to attend and open to members and non-members. 
 
The event will be chaired by our President, Andrew Garrett.
 
Paper 1: 'Efficient statistical inference methods for assessing changes in species'
Authors: Emily B Dennis12, Alex Diana3, Eleni Matechou2, Byron J T Morgan2
(1Butterfly Conservation, 2University of Kent, 3University of Essex)

Download the preprint
Supplementary materials 
 
Abstract: The global decline of biodiversity, driven by habitat degradation and climate breakdown, is a significant concern. Accurate measures of change are crucial to provide reliable evidence of species’ population changes. Meanwhile citizen science data have witnessed a remarkable expansion in both quantity and sources and serve as the foundation for assessing species’ status. The growing data reservoir presents opportunities for novel and improved inference but often comes with computational costs: computational efficiency is paramount, especially as regular analysis updates are necessary. Building upon recent research, we present illustrations of computationally efficient methods for fitting new models, applied to three major citizen science data sets for butterflies. We extend a method for modelling abundance changes of seasonal organisms, firstly to accommodate multiple years of count data efficiently, and secondly for application to counts from a snapshot mass-participation survey. We also present a variational inference approach for fitting occupancy models efficiently to opportunistic citizen science data. The continuous growth of citizen science data offers unprecedented opportunities to enhance our understanding of how species respond to anthropogenic pressures. Efficient techniques in fitting new models are vital for accurately assessing species’ status, supporting policy-making, setting measurable targets, and enabling effective conservation efforts.
 
Paper 2: 'Frequentist Prediction Sets for Species Abundance using Indirect Information'
Authors: Elizabeth Bersson and Peter D Hoff, Duke University, Durham, USA

Download the preprint
 
Abstract: Citizen science databases that consist of volunteer-led sampling efforts of species communities are relied on as essential sources of data in ecology. Summarising such data across counties with frequentist-valid prediction sets for each county provides an interpretable comparison across counties of varying size or composition. As citizen science data often feature unequal sampling efforts across a spatial domain, prediction sets constructed with indirect methods that share information across counties may be used to improve precision. In this article, we present a nonparametric framework to obtain precise prediction sets for a multinomial random sample based on indirect information that maintain frequentist coverage guarantees for each county. We detail a simple algorithm to obtain prediction sets for each county using indirect information where the computation time does not depend on the sample size and scales nicely with the number of species considered. The indirect information may be estimated by a proposed empirical Bayes procedure based on information from auxiliary data. Our approach makes inference for under-sampled counties more precise, while maintaining area-specific frequentist validity for each county. Our method is used to provide a useful description of avian species abundance in North Carolina, USA based on citizen science data from the eBird database.
 
Paper3: 'Extreme-value modelling of migratory bird arrival dates: Insights from citizen science data'
Authors: Jonathan Koh, University of Bern, Switzerland and Thomas Opitz, INRAE, France

Download the preprint
 
Abstract: Citizen science mobilises many observers and gathers huge datasets but often without strict sampling protocols, resulting in observation biases due to heterogeneous sampling effort, which can lead to biased statistical inferences. We develop a spatiotemporal Bayesian hierarchical model for bias-corrected estimation of arrival dates of the first migratory bird individuals at a breeding site. Higher sampling effort could be correlated with earlier observed dates. We implement data fusion of two citizen-science datasets with fundamentally different protocols (BBS, eBird) and map posterior distributions of the latent process, which contains four spatial components with Gaussian process priors: species niche; sampling effort; position and scale parameters of annual first arrival date. The data layer includes four response variables: counts of observed eBird locations (Poisson); presence-absence at observed eBird locations (Binomial); BBS occurrence counts (Poisson);  first arrival dates (Generalised Extreme-Value). We devise a Markov Chain Monte Carlo scheme and check by simulation that the latent process components are identifiable. We apply our model to several migratory bird species in the northeastern United States for 2001–2021 and find that the sampling effort significantly modulates the observed first arrival date. We exploit this relationship to effectively bias-correct predictions of the true first arrivals.

The papers will be published in Journal of the Royal Statistical Society Series A: Statistics in Society | Oxford Academic (oup.com)

'Inference for extreme spatial temperature events in a changing climate with application to Ireland'
Monday, 3 June 2024, 3-4pm
Online

Paper:  'Inference for extreme spatial temperature events in a changing climate with application to Ireland'.
Download the preprint

Authors: Dáire Healy, Jonathan Tawn, Peter Thorne and Andrew Parnell.

Abstract:
We investigate the changing nature of the frequency, magnitude, and spatial extent of extreme temperatures in Ireland from 1942 to 2020. We develop an extreme value model that captures spatial and temporal non-stationarity in extreme daily maximum temperature data. We model the tails of the marginal variables using the generalised Pareto distribution and the spatial dependence of extreme events by a semi-parametric Brown-Resnick r-Pareto process, with parameters of each model allowed to change over time. We use weather station observations for modelling extreme events since data from climate models (not conditioned on observational data) can over-smooth these events and have trends determined by the specific climate model configuration. However, climate models do provide valuable information about the detailed physiography over Ireland and the associated climate response. We propose novel methods which exploit the climate model data to overcome issues linked to the sparse and biased sampling of the observations. Our analysis identifies a temporal change in the marginal behaviour of extreme temperature events over the study domain, which is much larger than the change in mean temperature levels over this time window. We illustrate how these characteristics result in increased spatial coverage of the events that exceed critical temperatures.

The paper will be published in Journal of the Royal Statistical Society Series C: ScholarOne Manuscripts (manuscriptcentral.com)

'Independent Review of the UK Statistics Authority' by Denise Lievesley 
Took place at the RSS building (Errol St., London) and online
Wednesday, 22 May, 2024
Time: 4pm

Author: Denise Lievesley
Chair: Andrew Garrett, RSS President

Paper: The discussion will be based on the published review and the government's response to it: Independent Review of the UK Statistics Authority 2023 - 2024 - GOV.UK (www.gov.uk) 



Safe Testing
Wednesday, 24 January 2024, 4-6pm
Taking place at the RSS building (Errol St., London) and online

Paper: 'Safe Testing'

Authors: Peter Grünwald, CWI and Leiden University,  Netherlands, Rianne de Heide, Vrije Universiteit Amsterdam, Netherlands, Wouter Koolen, CWI and University of Twente, Netherlands.   
Download the preprint

 

Abstract
We develop the theory of hypothesis testing based on the e-value, a notion of evidence that, unlike the p-value, allows for effortlessly combining results from several studies in the common scenario where the decision to perform a new study may depend on previous outcomes. Tests based on e-values are safe, i.e. they preserve Type-I error guarantees, under such optional continuation. We define growth rate optimality (GRO) as an analogue of power in an optional continuation context, and we show how to construct GRO e-variables for general testing problems with composite null and alternative, emphasizing models with nuisance parameters. GRO e-values take the form of Bayes factors with special priors. We illustrate the theory using several classic examples including a one-sample safe t-test and the 2 × 2 contingency table. Sharing Fisherian, Neymanian and Jeffreys-Bayesian interpretations, e-values may provide a methodology acceptable to adherents of all three schools.

The paper will be published in Journal of the Royal Statistical Society Series B: Statistical Methodology | Oxford Academic (oup.com)



Root and community inference on the latent growth process of a network
Wednesday, 6 December 2023,  4-6pm
Took place at the RSS building (Errol St., London) and online

Paper: 'Root and community inference on the latent growth process of a network'
 

Authors: Harry Crane & Min Xu Rutgers University, New Brunswick, USA
Download the preprint


Abstract
Many existing statistical models for networks overlook the fact that most real-world networks are formed through a growth process. To address this, we introduce the PAPER (Preferential Attachment Plus Erdős–Rényi) model for random networks, where we let a random network G be the union of a preferential attachment (PA) tree T and additional Erdős–Rényi (ER) random edges. The PA tree component captures the underlying growth/recruitment process of a network where vertices and edges are added sequentially, while the ER component can be regarded as random noise. Given only a single snapshot of the final network G, we study the problem of constructing confidence sets for the early history, in particular the root node, of the unobserved growth process; the root node can be patient zero in a disease infection network or the source of fake news in a social media network. We propose an inference algorithm based on Gibbs sampling that scales to networks with millions of nodes and provide theoretical analysis showing that the expected size of the confidence set is small so long as the noise level of the ER edges is not too large. We also propose variations of the model in which multiple growth processes occur simultaneously, reflecting the growth of multiple communities, and we use these models to provide a new approach to community detection.


The paper will be published in Journal of the Royal Statistical Society Series B: Statistical Methodology | Oxford Academic (oup.com)
 
 

Parameterizing and Simulating from Causal Models
Tuesday, 3 October, 2023, 4-6pm
Took place at the RSS building (Errol St., London) and online
DeMO (pre meeting): 2:30pm

Paper: 'Parameterizing and Simulating from Causal Models'
Download the preprint.

Authors: Robin Evans, University of Oxford, UK, and Vanessa Didelez, University of Bremen, Germany.

Abstract
Many statistical problems in causal inference involve a probability distribution other than the one from which data are actually observed; as an additional complication, the object of interest is often a marginal quantity of this other probability distribution. This creates many practical complications for statistical inference, even where the problem is non-parametrically identified. In particular, it is difficult to perform likelihood-based inference, or even to simulate from the model in a general way. We introduce the ‘frugal parameterization’, which places the causal effect of interest at its centre, and then builds the rest of the model around it. We do this in a way that provides a recipe for constructing a regular, non-redundant parameterization using causal quantities of interest. In the case of discrete variables, we can use odds ratios to complete the parameterization, while in the continuous case copulas are the natural choice; other possibilities are also discussed. Our methods allow us to construct and simulate from models with parametrically specified causal distributions, and fit them using likelihood-based methods, including fully Bayesian approaches. Our proposal includes parameterizations for the average causal effect and effect of treatment on the treated, as well as other causal quantities of interest.

The paper will be published in Journal of the Royal Statistical Society Series B: Statistical Methodology | Oxford Academic (oup.com)
 



Probabilistic and Statistical Aspects of Machine Learning (multi-paper Discussion meeting)
Wednesday, 6 September, 2023, 5pm- 7pm
Took place at RSS Conference Harrogate

Paper 1: ‘Automatic Change-Point Detection in Time Series via Deep Learning'.
Download the preprint.
Download the supplementary material.


Authors:
Jie Li, London School of Economics and Political Science
Paul Fearnhead, Lancaster University
Piotr Fryzlewicz, London School of Economics and Political Science
Tengyao Wang, London School of Economics and Political Science.

Abstract:

Detecting change-points in data is challenging because of the range of possible types of change and types of behaviour of data when there is no change. Statistically efficient methods for detecting a change will depend on both of these features, and it can be difficult for a practitioner to develop an appropriate detection method for their application of interest. We show how to automatically generate new offline detection methods based on training a neural network. Our approach is motivated by many existing tests for the presence of a change-point being representable by a simple neural network, and thus a neural network trained with sufficient data should have performance at least as good as these methods. We present theory that quantifies the error rate for such an approach, and how it depends on the amount of training data. Empirical results show that, even with limited training data, its performance is competitive with the standard CUSUM-based classifier for detecting a change in mean when the noise is independent and Gaussian, and can substantially outperform it in the presence of auto-correlated or heavy-tailed noise. Our method also shows strong results in detecting and localizing changes in activity based on accelerometer data.

Paper 2: 'From Denoising Diffusions to Denoising Markov Models'.
Download the preprint.
Download the supplementary material.

Authors:
Joe Benton, University of Oxford
Yuyang Shi, University of Oxford
Valentin De Bortoli, ENS, Paris, France
George Deligiannidis, University of Oxford
Arnaud Doucet, University of Oxford

Abstract:

Denoising diffusions are state-of-the-art generative models exhibiting remarkable empirical performance. They work by diffusing the data distribution into a Gaussian distribution and then learning to reverse this noising process to obtain synthetic data[1]points. The denoising diffusion relies on approximations of the logarithmic derivatives of the noised data densities using score matching. Such models can also be used to perform approximate posterior simulation when one can only sample from the prior and likelihood. We propose a unifying framework generalizing this approach to a wide class of spaces and leading to an original extension of score matching. We illustrate the resulting models on various applications.

The papers will be published in Journal of the Royal Statistical Society Series B: Statistical Methodology | Oxford Academic (oup.com)
 


A system of population estimates compiled from administrative data only
Tuesday, 27 June, 2023 4-6pm
Took place at the RSS building (Errol St., London) and online
DeMO (pre meeting): 2:30pm

Paper: ’A system of population estimates compiled from administrative data only'
Download the preprint.
Download supplementary material.

Authors:  John Dunne, Central Statistics Office, Ireland and Li-Chun Zhang, University of Southampton and Statistics Norway.

Abstract
This paper presents a novel system of annual Population Estimates Compiled from Administrative Data Only (PECADO) for Ireland in the absence of a Central Population Register. The system is entirely based on data originated from administrative sources, so that population estimates can be produced even without purposely designed coverage surveys or a periodic census to recalibrate estimates. It requires several extensions to the traditional Dual System Estimation (DSE) methodology, including a restatement of the underlying assumptions, a trimmed DSE method for dealing with erroneous enumerations in the administrative register, and a test for heterogenous capture probabilities to facilitate the choice of blocking in applications. The PECADO estimates for the years 2011 - 2016 are compared to the Census counts in 2011 and 2016. We demonstrate how the system can be used to investigate the Census 2016 undercount in Ireland, in place of the traditional approach of deploying additional population coverage surveys.

The paper will be published in Journal of the Royal Statistical Society Series A: Statistics in Society | Oxford Academic (oup.com)

Estimating means of bounded random variables by betting
Tuesday, May 23, 2023, 4-6pm (GMT)
Took place at the RSS building (Errol St., London) and online

Paper: 'Estimating means of bounded random variables by betting’
Download the preprint.
Download supplementary material. 

DeMO Introduction:
Download the slides
Access the recording on YouTube
 

Authors: Ian Waudby-Smith and Aaditya Ramdas, Carnegie Mellon University, USA.
 
Abstract
This paper derives confidence intervals (CI) and time-uniform confidence sequences (CS) for the classical problem of estimating an unknown mean from bounded observations. We present a general approach for deriving concentration bounds, that can be seen as a generalization and improvement of the celebrated Chernoff method. At its heart, it is based on a class of composite non-negative martingales, with strong connections to testing by betting and the method of mixtures. We show how to extend these ideas to sampling without replacement, another heavily studied problem. In all cases, our bounds are adaptive to the unknown variance, and empirically vastly outperform existing approaches based on Hoeffding or empirical Bernstein inequalities and their recent super martingale generalizations by Howard et al.[2021]. In short, we establish a new state-of-the-art for four fundamental problems: CSs and CIs for bounded means, when sampling with and without replacement.

The paper will be published in Journal of the Royal Statistical Society Series B: Statistical Methodology | Oxford Academic (oup.com)
 


Martingale Posterior Distributions
Monday, December 12, 2022, 5-7pm GMT
Took place at the RSS building (Errol st, London) and online
DeMO (pre-meeting): 3:30pm (Errol Street and online).

Paper: 'Martingale Posterior Distributions’
Download the preprint.

Authors: Edwin Fong*, Chris Holmes* and Stephen G. Walker #


* The Alan Turing Institute and University of Oxford
# University of Texas at Austin, USA

Abstract
The prior distribution is the usual starting point for Bayesian uncertainty. In this paper, we present a different perspective which focuses on missing observations as the source of statistical uncertainty, with the parameter of interest being known precisely given the entire population. We argue that the foundation of Bayesian inference is to assign a distribution on missing observations conditional on what has been observed. In the i.i.d. setting with an observed sample of size n, the Bayesian would thus assign a predictive distribution on the missing Yn+1: ∞ conditional on Y1: n , which then induces a distribution on the parameter. We utilize Doob’s theorem, which relies on martingales, to show that choosing the Bayesian predictive distribution returns the conventional posterior as the distribution of the parameter. Taking this as our cue, we relax the predictive machine, avoiding the need for the predictive to be derived solely from the usual prior to posterior to predictive density formula. We introduce the martingale posterior distribution, which returns Bayesian uncertainty on any statistic via the direct specification of the joint predictive. To that end, we introduce new predictive methodologies for multivariate density estimation, regression and classification that build upon recent work on bivariate copulas.

The paper will be published in Journal of the Royal Statistical Society Series B: Statistical Methodology | Oxford Academic (oup.com)



Flexible marked spatio-temporal point processes with applications to event sequences from association football
Tuesday, November 22, 2022, 11-1pm GMT
DeMO (pre-meeting): 9:30 am
Online 

Paper: 'Flexible marked spatio-temporal point processes with applications to event sequences from association football’.
Download the preprint

Authors: Santhosh Narayanan4, Ioannis Kosmidis1 and Petros Dellaportas2,3

1Department of Statistics, University of Warwick
2Department of Statistical Science, University College London
3Department of Statistics, Athens University of Economics and Business
4The Alan Turing Institute

Abstract
We develop a new family of marked point processes by focusing the characteristic properties of marked Hawkes processes exclusively to the space of marks, providing the freedom to specify a different model for the occurrence times. This is possible through the decomposition of the joint distribution of marks and times that allows to separately specify the conditional distribution of marks given the filtration of the process and the current time.

We develop a Bayesian framework for the inference and prediction from this family of marked point processes that can naturally accommodate process and point-specific covariate information to drive cross-excitations, offering wide flexibility and applicability in the modelling of real-world processes. The framework is used here for the modelling of in-game event sequences from association football, resulting not only in inferences about previously unquantified characteristics of game dynamics and extraction of event-specific team abilities, but also in predictions for the occurrence of events of interest, such as goals, corners or fouls in a specified interval of time.

The paper will be published in Journal of the Royal Statistical Society Series C: Applied Statistics | Oxford Academic (oup.com)
 

Statistical Aspects of Climate Change

Wednesday, September 14, 2022, 5-7pm BST
Took place at the RSS Conference, Aberdeen

A multi-paper meeting featuring two discussion papers and organised by the RSS Discussion Meetings Committee and RSS Environmental Statistics Section

Paper 1: ‘Assessing present and future risk of water damage using building attributes, meteorology and topography’
Download the preprint.
Erratum.
Link to supporting data.

Authors: Claudio Heinrich-Mertsching*, Jens Christian Wahl*, Alba Ordonez*, Marita Stien#, John Elvsborg#, Ola Haug*, Thordis L. Thorarinsdottir*
 
* Norwegian Computing Center, Oslo, Norway
# Gjensidige Forsikring ASA, Oslo, Norway

Abstract
Weather-related risk makes the insurance industry inevitably concerned with climate and climate change. Buildings hit by pluvial flooding is a key manifestation of this risk, giving rise to compensations for the induced physical damages and business interruptions. In this work, we establish a nationwide, building-specific risk score for water damage associated with pluvial flooding in Norway. We fit a generalized additive model that relates the number of water damages to a wide range of explanatory variables that can be categorized into building attributes, climatological variables and topographical characteristics. The model assigns a risk score to every location in Norway, based on local topography and climate, which is not only useful for insurance companies, but also for city planning. Combining our model with an ensemble of climate projections allows us to project the (spatially varying) impacts of climate change on the risk of pluvial flooding towards the middle and end of the 21st century

Paper 2: 'The importance of context in extreme value analysis with application to extreme temperatures in the USA and Greenland'
Download the preprint.
Link to supporting data

Authors: Daniel Clarkson, Emma Eastoe and Amber Leeson, University of Lancaster, UK

Abstract
Statistical extreme value models allow estimation of the frequency, magnitude and spatio-temporal extent of extreme temperature events in the presence of climate change. Unfortunately, the assumptions of many standard methods are not valid for complex environmental data sets, with a realistic statistical model requiring appropriate incorporation of scientific context. We examine two case studies in which the application of routine extreme value methods result in inappropriate models and inaccurate predictions. In the first scenario, record-breaking temperatures experienced in the US in the summer of 2021 are found to exceed the maximum feasible temperature predicted from a standard extreme value analysis of pre-2021 data. Incorporating random effects into the standard methods accounts for additional variability in the model parameters, reflecting shifts in unobserved climatic drivers and permitting greater accuracy in return period prediction. The second scenario examines ice surface temperatures in Greenland. The temperature distribution is found to have a poorly-defined upper tail, with a spike in observations just below 0◦C and an unexpectedly large number of measurements above this value. A Gaussian mixture model fit to the full range of measurements is found to improve fit and predictive abilities in the upper tail when compared to traditional extreme value methods.

The papers will be published in the Journal of the Royal Statistical Society: Series C (Applied Statistics) - Wiley Online Library
 


 

Statistical Aspects of the Covid-19 Pandemic 
2nd Multi-paper Discussion Meeting

Took place Thursday, June 16, 2022, 4-6pm BST

Paper 1: ‘Bayesian semi-mechanistic modelling of COVID-19: identifiability, sensitivity, and policy implications’
Download the preprint.

Authors: Samir Bhatt, Neil Ferguson, Seth Flaxman, Axel Gandy, Swapnil Mishra, James Scott

Abstract
We propose a general Bayesian approach to modeling epidemics such as COVID-19. The approach grew out of specific analyses conducted during the pandemic, in particular an analysis concerning the effects of non-pharmaceutical interventions (NPIs) in reducing COVID-19 transmission in 11 European countries. The model parameterizes the time varying reproduction number Rt through a regression framework in which covariates can e.g. be governmental interventions or changes in mobility patterns. This allows a joint fit across regions and partial pooling to share strength. This innovation was critical to our timely estimates of the impact of lockdown and other NPIs in the European epidemics, whose validity was borne out by the subsequent course of the epidemic. Our framework provides a fully generative model for latent infections and observations deriving from them, including deaths, cases, hospitalizations, ICU admissions and seroprevalence surveys. One issue surrounding our model’s use during the COVID-19 pandemic is the confounded nature of NPIs and mobility. We use our framework to explore this issue. We have open sourced an R package epidemia implementing our approach in Stan. Versions of the model are used by New York State, Tennessee and Scotland to estimate the current situation and make policy decisions.

Paper 2: ‘A sequential Monte Carlo approach for estimation of timevarying reproduction numbers for Covid-19’
Download the preprint.

Authors: Geir Storvik, Alfonso Diz-Lois Palomares, Solveig Engebretsen, Gunnar Rø, Kenth Engo-Monsen, Anja Kristoffersen, Birgitte De Blasio, Arnoldo Frigessi

Abstract
The Covid-19 pandemic has required most countries to implement complex sequences of non-pharmaceutical interventions, with the aim of controlling the transmission of the virus in the population. To be able to take rapid decisions, a detailed understanding of the current situation is necessary. Estimates of time-varying, instantaneous reproduction numbers represent a way to quantify the viral transmission in real time. They are often defined through a mathematical compartmental model of the epidemic, like a stochastic SEIR model, whose parameters must be estimated from multiple time series of epidemiological data. Because of very high dimensional parameter spaces (partly due to the stochasticity in the spread models) and incomplete and delayed data, inference is very challenging. We propose a state space formalisation of the model and a sequential Monte Carlo approach which allow to estimate a daily-varying reproduction number for the Covid-19 epidemic in Norway with sufficient precision, on the basis of daily hospitalisation and positive test incidences. The method was in regular use in Norway during the pandemics and appears to be a powerful instrument for epidemic monitoring and management.



Paper: ‘Vintage Factor Analysis with Varimax Performs Statistical Inference’
Authors: Karl Rohe and Muzhe Zeng, University of Wisconsin-Madison, USA

Took place on Wednesday, 11 May 2022 3-5pm (BST)

Abstract
Psychologists developed Multiple Factor Analysis to decompose multivariate data into a small number of interpretable factors without any a priori knowledge about those factors [Thurstone, 1935]. In this form of factor analysis, the Varimax factor rotation redraws the axes through the multidimensional factors to make them sparse and thus make them more interpretable [Kaiser, 1958].

Charles Spearman and many others objected to factor rotations because the factors seem to be rotationally invariant [Thurstone, 1947, Anderson and Rubin, 1956]. These objections are still reported in all contemporary multivariate statistics textbooks. However, this vintage form of factor analysis has survived and is widely popular because, empirically, the factor rotation often makes the factors easier to interpret. We argue that the rotation makes the factors easier to interpret because, in fact, the Varimax factor rotation performs statistical inference.

We show that Principal Components Analysis (PCA) with the Varimax axes provides a unified spectral estimation strategy for a broad class of semi-parametric factor models, including the Stochastic Blockmodel and a natural variation of Latent Dirichlet Allocation (ie 'topic modeling'). In addition, we show that Thurstone’s widely employed sparsity diagnostics implicitly assess a key leptokurtic condition that makes the axes statistically identifiable in these models. Taken together, this shows that the know-how of Vintage Factor Analysis performs statistical inference, reversing nearly a century of statistical thinking on the topic. We illustrate these techniques use on two large bibliometric examples (a citation network and a text corpus). With a sparse eigensolver, PCA with Varimax is both fast and stable. Combined with Thurstone’s straightforward diagnostics, this vintage approach is suitable for a wide array of modern applications.

Download the preprint

The paper will be published in the Journal of the Royal Statistical Society, Series B.



Paper: 'Experimental Evaluation of Algorithm-Assisted Human Decision-Making: Application to Pretrial Public Safety Assessment'
Authors: Imai et al.
Tuesday 8 February, 2022
To be published in JRSSA.

Abstract
Despite an increasing reliance on fully-automated algorithmic decision-making in our lives, human beings still make consequential decisions. We develop a statistical methodology for experimentally evaluating the causal impacts of algorithmic recommendations on human decisions. We also show how to examine whether algorithmic recommendations improve the fairness of human decisions and derive the optimal decision rules under various settings. We apply the proposed methodology to preliminary data from the first-ever randomized controlled trial that evaluates the pretrial Public Safety Assessment (PSA) in the criminal justice system. A goal of the PSA is to help judges decide which arrested individuals should be released. We find that providing the PSA to the judge has little overall impact on the judge's decisions and subsequent arrestee behavior. However, we find that the PSA may help avoid unnecessarily harsh decisions for female arrestees while it encourages the judge to make stricter decisions for male arrestees who are deemed to be risky. For fairness, the PSA appears to increase the gender bias against males while having little effect on any existing racial differences in judges' decision. Finally, we find that the PSA's recommendations might be unnecessarily severe unless the cost of a new crime is sufficiently high.
Download the preprint



 

View our playlist of recent Discussion Meetings
Read past Discussion Papers