Making messy data: creating more realistic, synthetic data for teaching and testing

Date: Thursday 10 July 2025, 1.00PM - 2.00PM
Location: Online
Online - a link to join the event will be sent to those who register.
Local Group Meeting
Book now


Share this event

The RSS Manchester local group invites you to an online seminar with Dr Nicola Rennie (ONS). 

Abstract
Equipping students in statistics and data science with the necessary data wrangling skills to handle real-world data is a crucial aspect of their education. Real data, unlike the clean, structured examples often used in teaching, can include a variety of challenges such as typographical errors, missing values encoded in unconventional ways, or unexpected spaces in text. These issues, and others stemming from human error or software incompatibilities, are common in real-world datasets and it is essential for students to learn how to address them in order to develop the practical skills needed for professional data analysis. Similarly, when developing methodology and the software that implements it, realistic data for testing purposes is necessary to ensure robustness.

In this session, I'll introduce the 'messy' R package designed to introduce controlled levels of messiness into existing, clean datasets. This package allows you to retain the structure and simplicity of familiar example datasets while providing students with a realistic, manageable data cleaning experience. I'll also demonstrate some ways in which the package can be used, and discuss the future direction of its development.
 
Dr. Rennie is a data visualisation specialist at the Office for National Statistics (ONS) with a background in statistics and data science, holding a PhD in Statistics and Operational Research from Lancaster University. She has co-authored the Royal Statistical Society's Best Practices for Data Visualisation guidance and written several articles on data visualisation for the Royal Statistical Society's Significance magazine.  Dr. Rennie is also a member of the Editorial Board of the Significance magazine.

Dr. Rennie is active in the teaching of statistics and understanding how to effectively communicate complex quantitative ideas in an accessible way. She is a committee member and secretary of the Royal Statistical Society (RSS) Teaching Statistics Section Group, and is also one of the RSS's 2024-2025 William Guy Lecturers. She has also previously been a committee member of the R-Ladies Global Team and the R-Ladies Lancaster chapter organiser.
 
Book now