Data quality: How to save time and simultaneously improve your results

Date: Thursday 23 April 2026, 12.00PM - 1.00PM
Location: Online
Online. Joining instructions will be sent to those who register.
Local Group Meeting
Book now


Share this event

The RSS Manchester local group will host an online seminar with Professor Roy Ruddle (Professor of Computing and Director of Research Technology at the Leeds Institute for Data Analytics) who will discuss his work around data quality.
 

Do you know there are more than 100 ways in which data can be of “low” quality? That is one reason why data preparation often takes more than half of a data science project’s time. Professor Roy Ruddle will explain the what, when, how and why of data quality, introducing you to our publicly available, 6-step method (https://doi.org/10.5518/1481).

First, he will outline how we have simplified those many data quality issues into a set of plain English tasks and questions for you to answer about your data (the “what”). Then he will explain the order you in which you should do those tasks (i.e., “when”), using real examples to illustrate computations and visualizations that you can use to answer each question (i.e., “how”).

Along the way, you will learn how the method enables you: (1) to do more in less time, (2) avoid re-work and, (3) by understanding any limitations of your data and correcting any mis-held assumptions, can avoid errors in your results (i.e., “why”).

 

 

 

Professor Roy Ruddle is a Professor of Computing and Director of Research Technology at the Leeds Institute for Data Analytics. He has worked in both industry and academia, developed the Leeds Virtual Microscope which is used around the world for cancer diagnosis, and co-founded the company Petriva which provides specialist visual data analysis and data mining software. Roy is an expert in data visualization and data quality. With input from experts working in 15 industry sectors, he developed a publicly available 6-step data quality method (https://doi.org/10.5518/1481) that is efficient, rigorous and offers time savings, cost reductions and improved result accuracy. He has also developed a software implementation of the method – the open source vizdataquality Python package (https://pypi.org/project/vizdataquality/).

 
Book now