NLP-enhanced inflation measurement using BERT and web scraping

In this research note, we explore the integration of natural language processing (NLP) and web scraping techniques to develop a custom price index for measuring inflation. Using the Harmonized Index of Consumer Prices (HICP) as a benchmark, we created a database of consumer electronics product data...

Full description

Saved in:
Bibliographic Details
Main Authors: Martin Berki, Vanesa Andicsova, Milos Oravec
Format: Article
Language:English
Published: Frontiers Media S.A. 2025-04-01
Series:Frontiers in Artificial Intelligence
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/frai.2025.1520659/full
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849702798761918464
author Martin Berki
Vanesa Andicsova
Milos Oravec
author_facet Martin Berki
Vanesa Andicsova
Milos Oravec
author_sort Martin Berki
collection DOAJ
description In this research note, we explore the integration of natural language processing (NLP) and web scraping techniques to develop a custom price index for measuring inflation. Using the Harmonized Index of Consumer Prices (HICP) as a benchmark, we created a database of consumer electronics product data through web scraping. Using the BERT model for classification, we achieved a high-performance classification of approximately 10,000 items into COICOP categories, with an accuracy of 94.56 %, macro precision of 79.41 %, and weighted precision of 94.07 % on validation data. Our custom index, particularly with weighted and median methodologies, demonstrated closer alignment with the official HICP while capturing more detailed price fluctuations within the market. Monthly inflation trends revealed variability that reflects price changes in the COICOP 091 category, contrasting with the relative stability of the official HICP. This work provides an alternative perspective on inflation measurement, highlighting the potential of computational approaches to enhance economic analysis.
format Article
id doaj-art-38557265a0f54aefa69ae2b6540ea286
institution DOAJ
issn 2624-8212
language English
publishDate 2025-04-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Artificial Intelligence
spelling doaj-art-38557265a0f54aefa69ae2b6540ea2862025-08-20T03:17:31ZengFrontiers Media S.A.Frontiers in Artificial Intelligence2624-82122025-04-01810.3389/frai.2025.15206591520659NLP-enhanced inflation measurement using BERT and web scrapingMartin BerkiVanesa AndicsovaMilos OravecIn this research note, we explore the integration of natural language processing (NLP) and web scraping techniques to develop a custom price index for measuring inflation. Using the Harmonized Index of Consumer Prices (HICP) as a benchmark, we created a database of consumer electronics product data through web scraping. Using the BERT model for classification, we achieved a high-performance classification of approximately 10,000 items into COICOP categories, with an accuracy of 94.56 %, macro precision of 79.41 %, and weighted precision of 94.07 % on validation data. Our custom index, particularly with weighted and median methodologies, demonstrated closer alignment with the official HICP while capturing more detailed price fluctuations within the market. Monthly inflation trends revealed variability that reflects price changes in the COICOP 091 category, contrasting with the relative stability of the official HICP. This work provides an alternative perspective on inflation measurement, highlighting the potential of computational approaches to enhance economic analysis.https://www.frontiersin.org/articles/10.3389/frai.2025.1520659/fullinflation measurementnatural language processingweb scrapingBERTprice indexeconomic analysis
spellingShingle Martin Berki
Vanesa Andicsova
Milos Oravec
NLP-enhanced inflation measurement using BERT and web scraping
Frontiers in Artificial Intelligence
inflation measurement
natural language processing
web scraping
BERT
price index
economic analysis
title NLP-enhanced inflation measurement using BERT and web scraping
title_full NLP-enhanced inflation measurement using BERT and web scraping
title_fullStr NLP-enhanced inflation measurement using BERT and web scraping
title_full_unstemmed NLP-enhanced inflation measurement using BERT and web scraping
title_short NLP-enhanced inflation measurement using BERT and web scraping
title_sort nlp enhanced inflation measurement using bert and web scraping
topic inflation measurement
natural language processing
web scraping
BERT
price index
economic analysis
url https://www.frontiersin.org/articles/10.3389/frai.2025.1520659/full
work_keys_str_mv AT martinberki nlpenhancedinflationmeasurementusingbertandwebscraping
AT vanesaandicsova nlpenhancedinflationmeasurementusingbertandwebscraping
AT milosoravec nlpenhancedinflationmeasurementusingbertandwebscraping