From Rags to Riches: Using web-scraped data to derive a clothing price index

Date: Wednesday 20 September 2023, 12.00PM

Location: Online

Section Group Meeting

Share this event

Clothing contributes approximately 5% of the CPI basket in the UK and currently is covered with manually collected data. We obtain web-scraped data from the online shopping websites of the main retailers in the clothing sector. We aim to increase product coverage with the high numbers of clothing items collected via web-scraping compared to manual price collection. This helps us to have more representative price data as we collect daily prices and improves granularity of the index since we can cover more various types of clothing.

We process web-scraped textual data using Natural Language Processing (NLP) and machine learning techniques to build a clothing price index. This presentation outlines three of the key pipelines we use to build the index: 1. Clothing Classification, 2. Product Grouping, 3. Index Run.

First, the classification pipeline builds a supervised machine learning model to produce a classification mapper which maps individual clothing products to narrowly defined clothing consumption segments such as “women’s dresses” or “men’s jeans”. Secondly, we create a product grouping mapper which maps each product to a product group using a rules-based method. This is crucial for the clothing price index due to the high churn with high product turnover rates and seasonality in the market. Thirdly, we create a clothing price index using multilateral index methods as they allow better use of the dynamic structure of web-scraped data with entering and leaving products.

In conclusion, this project will allow us to modernise UK consumer price statistics by making better use of new data sources and innovative methods.

Ahmet Aydin: Ahmet.Aydin@ons.gov.uk
Laura Christen: Laura.Christen@ons.gov.uk

Andrew Etherington: for the Official Statistics Section

Introducing the RSS

Watch our video

Who we are and what we do

View our 'About' section

Consultants Directory

Find a consultant

President and staff

Meet our president and staff

From Rags to Riches: Using web-scraped data to derive a clothing price index