Assessing Fine-Tuned NER Models with Limited Data in French: Automating Detection of New Technologies, Technological Domains, and Startup Names in Renewable Energy

Achieving carbon neutrality by 2050 requires unprecedented technological, economic, and sociological changes. With time as a scarce resource, it is crucial to base decisions on relevant facts and information to avoid misdirection. This study aims to help decision makers quickly find relevant informa...

Full description

Saved in:
Bibliographic Details
Main Authors: Connor MacLean, Denis Cavallucci
Format: Article
Language:English
Published: MDPI AG 2024-08-01
Series:Machine Learning and Knowledge Extraction
Subjects:
Online Access:https://www.mdpi.com/2504-4990/6/3/96
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850260391350763520
author Connor MacLean
Denis Cavallucci
author_facet Connor MacLean
Denis Cavallucci
author_sort Connor MacLean
collection DOAJ
description Achieving carbon neutrality by 2050 requires unprecedented technological, economic, and sociological changes. With time as a scarce resource, it is crucial to base decisions on relevant facts and information to avoid misdirection. This study aims to help decision makers quickly find relevant information related to companies and organizations in the renewable energy sector. In this study, we propose fine-tuning five RNN and transformer models trained for French on a new category, “TECH”. This category is used to classify technological domains and new products. In addition, as the model is fine-tuned on news related to startups, we note an improvement in the detection of startup and company names in the “ORG” category. We further explore the capacities of the most effective model to accurately predict entities using a small amount of training data. We show the progression of the model from being trained on several hundred to several thousand annotations. This analysis allows us to demonstrate the potential of these models to extract insights without large corpora, allowing us to reduce the long process of annotating custom training data. This approach is used to automatically extract new company mentions as well as to extract technologies and technology domains that are currently being discussed in the news in order to better analyze industry trends. This approach further allows to group together mentions of specific energy domains with the companies that are actively developing new technologies in the field.
format Article
id doaj-art-de1b70a9b3eb429497bf1540b55be0bf
institution OA Journals
issn 2504-4990
language English
publishDate 2024-08-01
publisher MDPI AG
record_format Article
series Machine Learning and Knowledge Extraction
spelling doaj-art-de1b70a9b3eb429497bf1540b55be0bf2025-08-20T01:55:38ZengMDPI AGMachine Learning and Knowledge Extraction2504-49902024-08-01631953196810.3390/make6030096Assessing Fine-Tuned NER Models with Limited Data in French: Automating Detection of New Technologies, Technological Domains, and Startup Names in Renewable EnergyConnor MacLean0Denis Cavallucci1INSA Strasbourg, 67000 Strasbourg, FranceINSA Strasbourg, 67000 Strasbourg, FranceAchieving carbon neutrality by 2050 requires unprecedented technological, economic, and sociological changes. With time as a scarce resource, it is crucial to base decisions on relevant facts and information to avoid misdirection. This study aims to help decision makers quickly find relevant information related to companies and organizations in the renewable energy sector. In this study, we propose fine-tuning five RNN and transformer models trained for French on a new category, “TECH”. This category is used to classify technological domains and new products. In addition, as the model is fine-tuned on news related to startups, we note an improvement in the detection of startup and company names in the “ORG” category. We further explore the capacities of the most effective model to accurately predict entities using a small amount of training data. We show the progression of the model from being trained on several hundred to several thousand annotations. This analysis allows us to demonstrate the potential of these models to extract insights without large corpora, allowing us to reduce the long process of annotating custom training data. This approach is used to automatically extract new company mentions as well as to extract technologies and technology domains that are currently being discussed in the news in order to better analyze industry trends. This approach further allows to group together mentions of specific energy domains with the companies that are actively developing new technologies in the field.https://www.mdpi.com/2504-4990/6/3/96natural language processingnamed entity recognitionrenewable energyweb-scraping
spellingShingle Connor MacLean
Denis Cavallucci
Assessing Fine-Tuned NER Models with Limited Data in French: Automating Detection of New Technologies, Technological Domains, and Startup Names in Renewable Energy
Machine Learning and Knowledge Extraction
natural language processing
named entity recognition
renewable energy
web-scraping
title Assessing Fine-Tuned NER Models with Limited Data in French: Automating Detection of New Technologies, Technological Domains, and Startup Names in Renewable Energy
title_full Assessing Fine-Tuned NER Models with Limited Data in French: Automating Detection of New Technologies, Technological Domains, and Startup Names in Renewable Energy
title_fullStr Assessing Fine-Tuned NER Models with Limited Data in French: Automating Detection of New Technologies, Technological Domains, and Startup Names in Renewable Energy
title_full_unstemmed Assessing Fine-Tuned NER Models with Limited Data in French: Automating Detection of New Technologies, Technological Domains, and Startup Names in Renewable Energy
title_short Assessing Fine-Tuned NER Models with Limited Data in French: Automating Detection of New Technologies, Technological Domains, and Startup Names in Renewable Energy
title_sort assessing fine tuned ner models with limited data in french automating detection of new technologies technological domains and startup names in renewable energy
topic natural language processing
named entity recognition
renewable energy
web-scraping
url https://www.mdpi.com/2504-4990/6/3/96
work_keys_str_mv AT connormaclean assessingfinetunednermodelswithlimiteddatainfrenchautomatingdetectionofnewtechnologiestechnologicaldomainsandstartupnamesinrenewableenergy
AT deniscavallucci assessingfinetunednermodelswithlimiteddatainfrenchautomatingdetectionofnewtechnologiestechnologicaldomainsandstartupnamesinrenewableenergy