At the 2021 RSS Conference, the
Official Statistics Section organised a session on automating official statistics.
Automation in official statistics provides opportunities to improve efficiency of regular production tasks, improving resilience and enabling statisticians to spend less time on routine tasks. Whilst not a new idea, it has had renewed focus in recent years. This has included Reproducible Analysis (ie analysis with a clear audit trail that explains how and why it was carried out) and the use of reproducible analytical pipelines (i.e. the software methods used to make analysis reproducible). These methods have been promoted via the Reproducible Analytical Pipeline (RAP) champions network and ONS’ Data Science Campus.
Speakers
- Alexander Newton (Office for National Statistics)
- Arthur Turrell (Data Science Campus, Office for National Statistics)
- Anna Price (Office for Statistics Regulation)
- Michael Hodge (Office for Statistics Regulation)
Reproducible Analytical Pipelines So Far (Alexander Newton)
- The focus of this talk was not on automation but what work has been done so far on RAPs and digital transformation in government.
- Most analysis in government requires some manual work (e.g. EXCEL, SPSS, SAS), done in a predefined order by humans; it is difficult and time-consuming to run these processes, making official statistics (OS) difficult to reproduce and for the public to have trust in them.
- Reproducibility is key to show you can get the same set of results on repeating a process; it is the foundation for good peer review and audit.
- The best way to make analysis reproducible is to write it as code. For this, it is good practice to use: open source software; peer review; documentation; version control; and open source code.
- RAP is analysis as code. The foundations of RAP are: code written with open source software (R/Python); use of version control (Github/Gitlab); embedded documentation; embedded documentation; automated QA; minimisation of manual steps (i.e. automation in service to other goals); and maximisation of transparency.
- RAP to produce OS leads to: improved quality; increased trust; more efficient processes; better business continuity; and better knowledge management.
- In practice, such projects project consists of a single repository of local csv files that feed into a RAP package which extract the data, analyse and visualise it, and write a data product and report to output.
- To successfully convert and old analysis process to a RAP, teams need: trust from senior managers; commitment from team members; enough time for members to contribute; a basic understanding of coding; the right tools in place; a process to transition from BAU; and an agile approach to project management.
Automation at the Data Science Campus (Arthur Turrell)
- Began with the key takeaways from the talk: automation is brilliant; automation is difficult (especially with legacy technology); automation is more than coding; and public sector bodies can learn from each other.
- The reasons we want to automate processes are for: productivity; reduced costs; reliability; reproducibility; testing (easier to do with a computer than human); and to make jobs more interesting (i.e. automate the boring stuff).
- Those processes we can automate are: statistics; analysis; and research report updating.
- Arthur then talked through an example of an automation of traffic cameras project thar the DS campus did in collaboration with Newcastle university. The purpose of the project was to give an idea of the busyness of urban areas using existing camera technology. He talked through issues around artifacts, missing data, privacy, validation, scaling, security, and cost. The output of the project was an additional ONS economic activity real-time indicator on traffic busyness, published weekly.
The Regulator: Advocating for RAP Principles in Government Analysis (Anna Price and Michael Hodge)
- The Office for Statistics Regulation (OSR) is the regulatory arm of the UK Statistics Authority. They promote and safeguard the production and publication of official statistics. They do not produce their own statistics and are separate from the ONS.
- The OSR’s role is to govern: how statistics are produced; how statistics are used; and how statistics are valued.
- Specifically, their work involves: upholding the Code of Practice for statistics; designation of national statistics; carrying out systemic reviews; and undertaking casework.
- The OSR have also become more involved in non-official statistics and have been supporting RAPs through assessments and reviews.
- The speakers discussed their organisational vision for RAPS to support the highest standards of OS and highlighted that RAPs enhance: trust in statistics; quality; and value.
- OSR wants RAP principles to be the default approach to analytical work in government and highlighted in a recent report that to achieve this will need: a shared understanding of RAP across government; support from senior leaders; a strategy set by GSS; increased programming and code management skills; increased mentoring support; and an unblocking of barriers to tools and systems.
- The speakers encouraged those considering using RAPs in their department to contact OSR.