With the advent of large-scale sky surveys, astronomy is entering an era of unprecedented data complexity and scale, enabling exploration of foundational questions about the Universe. As datasets grow in size and dimension, so too does the importance of addressing data selection effects and other biases in data collection that can distort scientific inference. This talk addresses the fundamental issue of non-representative data in modern astrophysical surveys, through the development of methodology at the intersection of statistical machine learning, Bayesian statistics, and causal inference.
The first part of the talk introduces StratLearn, a statistically principled and theoretically justified method for supervised learning under covariate shift, building upon propensity score stratification in causal inference. StratLearn improves generalization when the training data are not representative of the target population and shows strong empirical performance for Type Ia supernova classification and photometric redshift estimation. Building on this, I present a Bayesian hierarchical model that uses full conditional photometric redshift density estimates (obtained via StratLearn) to calibrate galaxy redshift distributions for weak-lensing tomography, yielding nearly unbiased estimates of the target population means. Finally, I discuss a hierarchical Bayesian framework that combines data from non-representative X-ray and optical surveys to infer galaxy luminosity distributions, explicitly modeling their distinct incompleteness mechanisms within a unified probabilistic framework. Together, this work provides general-purpose statistical methodology alongside approaches tailored to topical problems in astrophysics, supporting statistically principled, scientifically justified, and computationally efficient analysis with potential relevance in other application domains facing similar challenges.
Dr. Maximilian (Max) Autenrieth is a Postdoctoral Research Fellow in Statistics at the University of Cambridge, jointly affiliated with the Department of Pure Mathematics and Mathematical Statistics and the Institute of Astronomy. He completed his PhD in Statistics at Imperial College London in 2023. His research develops principled Bayesian and machine learning methods for scientific inference with complex, large-scale, and non-representative data, with a focus on applications in astrophysics and cosmology. Max’s doctoral work introduced new approaches for domain adaptation and hierarchical modeling in astronomical surveys, leading to improved inference from incomplete, heterogeneous datasets. He is currently Program Chair of the American Statistical Association’s (ASA) Astrostatistics Interest Group and an active member of the CHASC International Astrostatistics Center. His work has been recognized with several awards, including the Royal Statistical Society Emerging Applications Thesis Prize, the ASA Astrostatistics Student Paper Competition (winner, 2024), and the ISBA Best Poster Award (2022, 2024).
Member - FREE
Non-member - £10.00
Book now