Using open data sources: RSS Merseyside Local Group meeting report

On 24th June, the RSS Merseyside Local Group and HiPyLiv hosted a joint event centred around large open data sources, their benefits, and how to access and use them. The event featured three talks from invited speakers, a panel discussion, and a live coding demo. HiPyLiv are a local Python community learning group that started in 2016. They offer free workshop-style events with peer working to develop coding skills.

The event was hosted at the University of Liverpool as our first in-person event since 2019. A recording of the event is available to watch back on the RSS Merseyside YouTube channel. The event was attended by 30, covering a range of career stages from undergraduates to faculty members. The online recording has received over 60 views total so far. 

The first speaker, Joseph Allen (N Brown PLC), showed a practical guide to “Working with Twitter data”. Joe discussed the unique benefits of Twitter data as a real-time indicator of opinions and events within the general population. By following his pre-made Jupyter interactive notebook, Joe presented code to handle 7000 tweets describing COVID-19-like symptoms during the last UK winter. He then showed the use of this data to track symptoms over time compared to case numbers and demonstrated some information retrieval  techniques to process and visualise this complex ‘unstructured text’ data.

The second speaker, Dr Joshua Longbottom (Liverpool School of Tropical Medicine) showed the powerful combination of “Open data & vector-borne disease modelling – leveraging remotely sensed data” to investigate the spatial risks of Chikungunya virus infection. Joshua showed a demonstration of the Google Earth web platform to select, process, and extract  satellite-origin data before presenting a machine learning approach to species distribution modelling, effectively mapping disease risk based on vegetation and climatic suitability.

Our final speaker, Prof. Dani Arribas-Bel (University of Liverpool) argued we should be “Open by default - developing reproducible, computational research”. Dani discussed the principles of reproducibility and open platform science and gave examples of such practice from the Urban Grammar project of the University of Liverpool and Alan Turing Institute, which aims to characterise urban spaces based on form and function. Dani highlighted that in using others’ open software packages, this project has given rise to new code that has then contributed back into these packages, emphasising the role of community in open science.

Talks were followed by a short panel discussion with speakers, in which version control and data quality were discussed in response to audience questions. After a break, the afternoon section of the event was led by Dr Robert Treharne and Sam Ball of HiPyLiv, who guided attendees through follow-along live Python coding in a notebook to extract and analyse script texts from ‘The Simpsons’. This showed attendees how to clean and summarise lines spoken by the show’s characters, merge and filter based on a tabular dataset of character traits, and set up a simple Markov chain to generate new script texts.

The RSS Merseyside Local Group will host a further in-person meeting on another skills-based topic next year.
 
Load more