RSS Discussion meeting 'Analysis of citizen science data' (in person)

Date: Tuesday 03 September 2024, 4.45PM

Location: DoubleTree by Hilton Brighton Metropole Kings Rd, Brighton and Hove, Brighton BN1 2FU

RSS Event

Share this event

'Analysis of citizen science data' (multi-paper Discussion meeting)

The Royal Statistical Society is pleased to invite you to the discussion of three papers at the annual conference in Brighton on 3 September, 4:45pm to 6:45pm. It is free to attend and open to members and non-members.

The event will be chaired by our President, Andrew Garrett.

Paper 1: 'Efficient statistical inference methods for assessing changes in species'
Authors: Emily B Dennis12, Alex Diana3, Eleni Matechou2, Byron J T Morgan2
(1Butterfly Conservation, 2University of Kent, 3University of Essex)

Abstract: The global decline of biodiversity, driven by habitat degradation and climate breakdown, is a significant concern. Accurate measures of change are crucial to provide reliable evidence of species’ population changes. Meanwhile citizen science data have witnessed a remarkable expansion in both quantity and sources and serve as the foundation for assessing species’ status. The growing data reservoir presents opportunities for novel and improved inference but often comes with computational costs: computational efficiency is paramount, especially as regular analysis updates are necessary. Building upon recent research, we present illustrations of computationally efficient methods for fitting new models, applied to three major citizen science data sets for butterflies. We extend a method for modelling abundance changes of seasonal organisms, firstly to accommodate multiple years of count data efficiently, and secondly for application to counts from a snapshot mass-participation survey. We also present a variational inference approach for fitting occupancy models efficiently to opportunistic citizen science data. The continuous growth of citizen science data offers unprecedented opportunities to enhance our understanding of how species respond to anthropogenic pressures. Efficient techniques in fitting new models are vital for accurately assessing species’ status, supporting policy-making, setting measurable targets, and enabling effective conservation efforts.

Paper 2: 'Frequentist Prediction Sets for Species Abundance using Indirect Information'
Authors: Elizabeth Bersson and Peter D Hoff, Duke University, Durham, USA

Abstract: Citizen science databases that consist of volunteer-led sampling efforts of species communities are relied on as essential sources of data in ecology. Summarising such data across counties with frequentist-valid prediction sets for each county provides an interpretable comparison across counties of varying size or composition. As citizen science data often feature unequal sampling efforts across a spatial domain, prediction sets constructed with indirect methods that share information across counties may be used to improve precision. In this article, we present a nonparametric framework to obtain precise prediction sets for a multinomial random sample based on indirect information that maintain frequentist coverage guarantees for each county. We detail a simple algorithm to obtain prediction sets for each county using indirect information where the computation time does not depend on the sample size and scales nicely with the number of species considered. The indirect information may be estimated by a proposed empirical Bayes procedure based on information from auxiliary data. Our approach makes inference for under-sampled counties more precise, while maintaining area-specific frequentist validity for each county. Our method is used to provide a useful description of avian species abundance in North Carolina, USA based on citizen science data from the eBird database.

Paper3: 'Extreme-value modelling of migratory bird arrival dates: Insights from citizen science data'
Authors: Jonathan Koh, University of Bern, Switzerland and Thomas Opitz, INRAE, France

Abstract: Citizen science mobilises many observers and gathers huge datasets but often without strict sampling protocols, resulting in observation biases due to heterogeneous sampling effort, which can lead to biased statistical inferences. We develop a spatiotemporal Bayesian hierarchical model for bias-corrected estimation of arrival dates of the first migratory bird individuals at a breeding site. Higher sampling effort could be correlated with earlier observed dates. We implement data fusion of two citizen-science datasets with fundamentally different protocols (BBS, eBird) and map posterior distributions of the latent process, which contains four spatial components with Gaussian process priors: species niche; sampling effort; position and scale parameters of annual first arrival date. The data layer includes four response variables: counts of observed eBird locations (Poisson); presence-absence at observed eBird locations (Binomial); BBS occurrence counts (Poisson); first arrival dates (Generalised Extreme-Value). We devise a Markov Chain Monte Carlo scheme and check by simulation that the latent process components are identifiable. We apply our model to several migratory bird species in the northeastern US for 2001–2021 and find that the sampling effort significantly modulates the observed first arrival date. We exploit this relationship to effectively bias-correct predictions of the true first arrivals.

The papers will be published in Journal of the Royal Statistical Society, Series A.

Please do join us to listen to the authors present them. You are also warmly encouraged to make a short comment on one or more of the papers following the author presentations.

Details of the meeting can be found on our website from where you can download the preprints:
RSS - Discussion paper meetings

Contact Judith Shorten

Introducing the RSS

Watch our video

Who we are and what we do

View our 'About' section

Consultants Directory

Find a consultant

President and staff

Meet our president and staff

RSS Discussion meeting 'Analysis of citizen science data' (in person)