Use of deep learning-based NLP models for full-text data elements extraction for systematic literature review tasks

Abstract Systematic literature review (SLR) is an important tool for Health Economics and Outcomes Research (HEOR) evidence synthesis. SLRs involve the identification and selection of pertinent publications and extraction of relevant data elements from full-text articles, which can be a manually int...

Full description

Saved in:
Bibliographic Details
Main Authors: Jingcheng Du, Dong Wang, Bin Lin, Long He, Liang-Chin Huang, Jingqi Wang, Frank J. Manion, Yeran Li, Nicole Cossrow, Lixia Yao
Format: Article
Language:English
Published: Nature Portfolio 2025-06-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-025-03979-5
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850137709729808384
author Jingcheng Du
Dong Wang
Bin Lin
Long He
Liang-Chin Huang
Jingqi Wang
Frank J. Manion
Yeran Li
Nicole Cossrow
Lixia Yao
author_facet Jingcheng Du
Dong Wang
Bin Lin
Long He
Liang-Chin Huang
Jingqi Wang
Frank J. Manion
Yeran Li
Nicole Cossrow
Lixia Yao
author_sort Jingcheng Du
collection DOAJ
description Abstract Systematic literature review (SLR) is an important tool for Health Economics and Outcomes Research (HEOR) evidence synthesis. SLRs involve the identification and selection of pertinent publications and extraction of relevant data elements from full-text articles, which can be a manually intensive procedure. Previously we developed machine learning models to automatically identify relevant publications based on pre-specified inclusion and exclusion criteria. This study investigates the feasibility of applying Natural Language Processing (NLP) approaches to automatically extract data elements from the relevant scientific literature. First, 239 full-text articles were collected and annotated for 12 important variables including study cohort, lab technique, and disease type, for proper SLR summary of Human papillomavirus (HPV) Prevalence, Pneumococcal Epidemiology, and Pneumococcal Economic Burden. The three resulting annotated corpora are shared publicly at [ https://github.com/Merck/NLP-SLR-corpora ], to provide training data and a benchmark baseline for the NLP community to further research this challenging task. We then compared three classic Named Entity Recognition (NER) algorithms, namely Conditional Random Fields (CRF), Long Short-Term Memory (LSTM), and the Bidirectional Encoder Representations from Transformers (BERT) models, to assess performance on the data element extraction task. The annotation corpora contain 4,498, 579, and 252 annotated entity mentions for HPV Prevalence, Pneumococcal Epidemiology, and Pneumococcal Economic Burden tasks respectively. Deep learning algorithms achieved superior performance in recognizing the targeted SLR data elements, compared to conventional machine learning algorithms. LSTM models have achieved 0.890, 0.646 and 0.615 micro-averaged F1 scores for three tasks respectively. CRF models could not provide comparable performance on most of the elements of interest. Although BERT-based models are known to generally achieve superior performance on many NLP tasks, we did not observe improvement in our three tasks. Deep learning algorithms have achieved superior performance compared with machine learning models on multiple SLR data element extraction tasks. LSTM model, in particular, is more preferable for deployment in supporting HEOR SLR data element extraction, due to its better performance, generalizability, and scalability as it’s cost-effective in our SLR benchmark datasets.
format Article
id doaj-art-5a149780dee94762b1811c4f5905fb4e
institution OA Journals
issn 2045-2322
language English
publishDate 2025-06-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-5a149780dee94762b1811c4f5905fb4e2025-08-20T02:30:46ZengNature PortfolioScientific Reports2045-23222025-06-011511810.1038/s41598-025-03979-5Use of deep learning-based NLP models for full-text data elements extraction for systematic literature review tasksJingcheng Du0Dong Wang1Bin Lin2Long He3Liang-Chin Huang4Jingqi Wang5Frank J. Manion6Yeran Li7Nicole Cossrow8Lixia Yao9Intelligent Medical ObjectsMerck & Co., Inc.Intelligent Medical ObjectsIntelligent Medical ObjectsIntelligent Medical ObjectsIntelligent Medical ObjectsIntelligent Medical ObjectsMerck & Co., Inc.Merck & Co., Inc.Merck & Co., Inc.Abstract Systematic literature review (SLR) is an important tool for Health Economics and Outcomes Research (HEOR) evidence synthesis. SLRs involve the identification and selection of pertinent publications and extraction of relevant data elements from full-text articles, which can be a manually intensive procedure. Previously we developed machine learning models to automatically identify relevant publications based on pre-specified inclusion and exclusion criteria. This study investigates the feasibility of applying Natural Language Processing (NLP) approaches to automatically extract data elements from the relevant scientific literature. First, 239 full-text articles were collected and annotated for 12 important variables including study cohort, lab technique, and disease type, for proper SLR summary of Human papillomavirus (HPV) Prevalence, Pneumococcal Epidemiology, and Pneumococcal Economic Burden. The three resulting annotated corpora are shared publicly at [ https://github.com/Merck/NLP-SLR-corpora ], to provide training data and a benchmark baseline for the NLP community to further research this challenging task. We then compared three classic Named Entity Recognition (NER) algorithms, namely Conditional Random Fields (CRF), Long Short-Term Memory (LSTM), and the Bidirectional Encoder Representations from Transformers (BERT) models, to assess performance on the data element extraction task. The annotation corpora contain 4,498, 579, and 252 annotated entity mentions for HPV Prevalence, Pneumococcal Epidemiology, and Pneumococcal Economic Burden tasks respectively. Deep learning algorithms achieved superior performance in recognizing the targeted SLR data elements, compared to conventional machine learning algorithms. LSTM models have achieved 0.890, 0.646 and 0.615 micro-averaged F1 scores for three tasks respectively. CRF models could not provide comparable performance on most of the elements of interest. Although BERT-based models are known to generally achieve superior performance on many NLP tasks, we did not observe improvement in our three tasks. Deep learning algorithms have achieved superior performance compared with machine learning models on multiple SLR data element extraction tasks. LSTM model, in particular, is more preferable for deployment in supporting HEOR SLR data element extraction, due to its better performance, generalizability, and scalability as it’s cost-effective in our SLR benchmark datasets.https://doi.org/10.1038/s41598-025-03979-5
spellingShingle Jingcheng Du
Dong Wang
Bin Lin
Long He
Liang-Chin Huang
Jingqi Wang
Frank J. Manion
Yeran Li
Nicole Cossrow
Lixia Yao
Use of deep learning-based NLP models for full-text data elements extraction for systematic literature review tasks
Scientific Reports
title Use of deep learning-based NLP models for full-text data elements extraction for systematic literature review tasks
title_full Use of deep learning-based NLP models for full-text data elements extraction for systematic literature review tasks
title_fullStr Use of deep learning-based NLP models for full-text data elements extraction for systematic literature review tasks
title_full_unstemmed Use of deep learning-based NLP models for full-text data elements extraction for systematic literature review tasks
title_short Use of deep learning-based NLP models for full-text data elements extraction for systematic literature review tasks
title_sort use of deep learning based nlp models for full text data elements extraction for systematic literature review tasks
url https://doi.org/10.1038/s41598-025-03979-5
work_keys_str_mv AT jingchengdu useofdeeplearningbasednlpmodelsforfulltextdataelementsextractionforsystematicliteraturereviewtasks
AT dongwang useofdeeplearningbasednlpmodelsforfulltextdataelementsextractionforsystematicliteraturereviewtasks
AT binlin useofdeeplearningbasednlpmodelsforfulltextdataelementsextractionforsystematicliteraturereviewtasks
AT longhe useofdeeplearningbasednlpmodelsforfulltextdataelementsextractionforsystematicliteraturereviewtasks
AT liangchinhuang useofdeeplearningbasednlpmodelsforfulltextdataelementsextractionforsystematicliteraturereviewtasks
AT jingqiwang useofdeeplearningbasednlpmodelsforfulltextdataelementsextractionforsystematicliteraturereviewtasks
AT frankjmanion useofdeeplearningbasednlpmodelsforfulltextdataelementsextractionforsystematicliteraturereviewtasks
AT yeranli useofdeeplearningbasednlpmodelsforfulltextdataelementsextractionforsystematicliteraturereviewtasks
AT nicolecossrow useofdeeplearningbasednlpmodelsforfulltextdataelementsextractionforsystematicliteraturereviewtasks
AT lixiayao useofdeeplearningbasednlpmodelsforfulltextdataelementsextractionforsystematicliteraturereviewtasks