A comparative analysis of large language models versus traditional information extraction methods for real-world evidence of patient symptomatology in acute and post-acute sequelae of SARS-CoV-2.
<h4>Background</h4>Patient symptoms, crucial for disease progression and diagnosis, are often captured in unstructured clinical notes. Large language models (LLMs) offer potential advantages in extracting patient symptoms compared to traditional rule-based information extraction (IE) sys...
Saved in:
| Main Authors: | , , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Public Library of Science (PLoS)
2025-01-01
|
| Series: | PLoS ONE |
| Online Access: | https://doi.org/10.1371/journal.pone.0323535 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850269880226414592 |
|---|---|
| author | Vedansh Thakkar Greg M Silverman Abhinab Kc Nicholas E Ingraham Emma K Jones Samantha King Genevieve B Melton Rui Zhang Christopher J Tignanelli |
| author_facet | Vedansh Thakkar Greg M Silverman Abhinab Kc Nicholas E Ingraham Emma K Jones Samantha King Genevieve B Melton Rui Zhang Christopher J Tignanelli |
| author_sort | Vedansh Thakkar |
| collection | DOAJ |
| description | <h4>Background</h4>Patient symptoms, crucial for disease progression and diagnosis, are often captured in unstructured clinical notes. Large language models (LLMs) offer potential advantages in extracting patient symptoms compared to traditional rule-based information extraction (IE) systems.<h4>Methods</h4>This study compared fine-tuned LLMs (LLaMA2-13B and LLaMA3-8B) against BioMedICUS, a rule-based IE system, for extracting symptoms related to acute and post-acute sequelae of SARS-CoV-2 from clinical notes. The study utilized three corpora: UMN-COVID, UMN-PASC, and N3C-COVID. Prevalence, keyword and fairness analyses were conducted to assess symptom distribution and model equity across demographics.<h4>Results</h4>BioMedICUS outperformed fine-tuned LLMs in most cases. On the UMN PASC dataset, BioMedICUS achieved a macro-averaged F1-score of 0.70 for positive mention detection, compared to 0.66 for LLaMA2-13B and 0.62 for LLaMA3-8B. For the N3C COVID dataset, BioMedICUS scored 0.75, while LLaMA2-13B and LLaMA3-8B scored 0.53 and 0.68, respectively for positive mention detection. However, LLMs performed better in specific instances, such as detecting positive mentions of change in sleep in the UMN PASC dataset, where LLaMA2-13B (0.79) and LLaMA3-8B (0.65) outperformed BioMedICUS (0.60). For fairness analysis, BioMedICUS generally showed stronger performance across patient demographics. Keyword analysis using ANOVA on symptom distributions across all three corpora showed that both corpus (df = 2, p < 0.001) and symptom (df = 79, p < 0.001) have a statistically significant effect on log-transformed term frequency-inverse document frequency (TF-IDF) values such that corpus accounts for 52% of the variance in log_tfidf values and symptom accounts for 35%.<h4>Conclusion</h4>While BioMedICUS generally outperformed the LLMs, the latter showed promising results in specific areas, particularly LLaMA3-8B, in identifying negative symptom mentions. However, both LLaMA models faced challenges in demographic fairness and generalizability. These findings underscore the need for diverse, high-quality training datasets and robust annotation processes to enhance LLMs' performance and reliability in clinical applications. |
| format | Article |
| id | doaj-art-734c7e80dc7e46b8b86e6a6e3b832e2b |
| institution | OA Journals |
| issn | 1932-6203 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | Public Library of Science (PLoS) |
| record_format | Article |
| series | PLoS ONE |
| spelling | doaj-art-734c7e80dc7e46b8b86e6a6e3b832e2b2025-08-20T01:52:54ZengPublic Library of Science (PLoS)PLoS ONE1932-62032025-01-01205e032353510.1371/journal.pone.0323535A comparative analysis of large language models versus traditional information extraction methods for real-world evidence of patient symptomatology in acute and post-acute sequelae of SARS-CoV-2.Vedansh ThakkarGreg M SilvermanAbhinab KcNicholas E IngrahamEmma K JonesSamantha KingGenevieve B MeltonRui ZhangChristopher J Tignanelli<h4>Background</h4>Patient symptoms, crucial for disease progression and diagnosis, are often captured in unstructured clinical notes. Large language models (LLMs) offer potential advantages in extracting patient symptoms compared to traditional rule-based information extraction (IE) systems.<h4>Methods</h4>This study compared fine-tuned LLMs (LLaMA2-13B and LLaMA3-8B) against BioMedICUS, a rule-based IE system, for extracting symptoms related to acute and post-acute sequelae of SARS-CoV-2 from clinical notes. The study utilized three corpora: UMN-COVID, UMN-PASC, and N3C-COVID. Prevalence, keyword and fairness analyses were conducted to assess symptom distribution and model equity across demographics.<h4>Results</h4>BioMedICUS outperformed fine-tuned LLMs in most cases. On the UMN PASC dataset, BioMedICUS achieved a macro-averaged F1-score of 0.70 for positive mention detection, compared to 0.66 for LLaMA2-13B and 0.62 for LLaMA3-8B. For the N3C COVID dataset, BioMedICUS scored 0.75, while LLaMA2-13B and LLaMA3-8B scored 0.53 and 0.68, respectively for positive mention detection. However, LLMs performed better in specific instances, such as detecting positive mentions of change in sleep in the UMN PASC dataset, where LLaMA2-13B (0.79) and LLaMA3-8B (0.65) outperformed BioMedICUS (0.60). For fairness analysis, BioMedICUS generally showed stronger performance across patient demographics. Keyword analysis using ANOVA on symptom distributions across all three corpora showed that both corpus (df = 2, p < 0.001) and symptom (df = 79, p < 0.001) have a statistically significant effect on log-transformed term frequency-inverse document frequency (TF-IDF) values such that corpus accounts for 52% of the variance in log_tfidf values and symptom accounts for 35%.<h4>Conclusion</h4>While BioMedICUS generally outperformed the LLMs, the latter showed promising results in specific areas, particularly LLaMA3-8B, in identifying negative symptom mentions. However, both LLaMA models faced challenges in demographic fairness and generalizability. These findings underscore the need for diverse, high-quality training datasets and robust annotation processes to enhance LLMs' performance and reliability in clinical applications.https://doi.org/10.1371/journal.pone.0323535 |
| spellingShingle | Vedansh Thakkar Greg M Silverman Abhinab Kc Nicholas E Ingraham Emma K Jones Samantha King Genevieve B Melton Rui Zhang Christopher J Tignanelli A comparative analysis of large language models versus traditional information extraction methods for real-world evidence of patient symptomatology in acute and post-acute sequelae of SARS-CoV-2. PLoS ONE |
| title | A comparative analysis of large language models versus traditional information extraction methods for real-world evidence of patient symptomatology in acute and post-acute sequelae of SARS-CoV-2. |
| title_full | A comparative analysis of large language models versus traditional information extraction methods for real-world evidence of patient symptomatology in acute and post-acute sequelae of SARS-CoV-2. |
| title_fullStr | A comparative analysis of large language models versus traditional information extraction methods for real-world evidence of patient symptomatology in acute and post-acute sequelae of SARS-CoV-2. |
| title_full_unstemmed | A comparative analysis of large language models versus traditional information extraction methods for real-world evidence of patient symptomatology in acute and post-acute sequelae of SARS-CoV-2. |
| title_short | A comparative analysis of large language models versus traditional information extraction methods for real-world evidence of patient symptomatology in acute and post-acute sequelae of SARS-CoV-2. |
| title_sort | comparative analysis of large language models versus traditional information extraction methods for real world evidence of patient symptomatology in acute and post acute sequelae of sars cov 2 |
| url | https://doi.org/10.1371/journal.pone.0323535 |
| work_keys_str_mv | AT vedanshthakkar acomparativeanalysisoflargelanguagemodelsversustraditionalinformationextractionmethodsforrealworldevidenceofpatientsymptomatologyinacuteandpostacutesequelaeofsarscov2 AT gregmsilverman acomparativeanalysisoflargelanguagemodelsversustraditionalinformationextractionmethodsforrealworldevidenceofpatientsymptomatologyinacuteandpostacutesequelaeofsarscov2 AT abhinabkc acomparativeanalysisoflargelanguagemodelsversustraditionalinformationextractionmethodsforrealworldevidenceofpatientsymptomatologyinacuteandpostacutesequelaeofsarscov2 AT nicholaseingraham acomparativeanalysisoflargelanguagemodelsversustraditionalinformationextractionmethodsforrealworldevidenceofpatientsymptomatologyinacuteandpostacutesequelaeofsarscov2 AT emmakjones acomparativeanalysisoflargelanguagemodelsversustraditionalinformationextractionmethodsforrealworldevidenceofpatientsymptomatologyinacuteandpostacutesequelaeofsarscov2 AT samanthaking acomparativeanalysisoflargelanguagemodelsversustraditionalinformationextractionmethodsforrealworldevidenceofpatientsymptomatologyinacuteandpostacutesequelaeofsarscov2 AT genevievebmelton acomparativeanalysisoflargelanguagemodelsversustraditionalinformationextractionmethodsforrealworldevidenceofpatientsymptomatologyinacuteandpostacutesequelaeofsarscov2 AT ruizhang acomparativeanalysisoflargelanguagemodelsversustraditionalinformationextractionmethodsforrealworldevidenceofpatientsymptomatologyinacuteandpostacutesequelaeofsarscov2 AT christopherjtignanelli acomparativeanalysisoflargelanguagemodelsversustraditionalinformationextractionmethodsforrealworldevidenceofpatientsymptomatologyinacuteandpostacutesequelaeofsarscov2 AT vedanshthakkar comparativeanalysisoflargelanguagemodelsversustraditionalinformationextractionmethodsforrealworldevidenceofpatientsymptomatologyinacuteandpostacutesequelaeofsarscov2 AT gregmsilverman comparativeanalysisoflargelanguagemodelsversustraditionalinformationextractionmethodsforrealworldevidenceofpatientsymptomatologyinacuteandpostacutesequelaeofsarscov2 AT abhinabkc comparativeanalysisoflargelanguagemodelsversustraditionalinformationextractionmethodsforrealworldevidenceofpatientsymptomatologyinacuteandpostacutesequelaeofsarscov2 AT nicholaseingraham comparativeanalysisoflargelanguagemodelsversustraditionalinformationextractionmethodsforrealworldevidenceofpatientsymptomatologyinacuteandpostacutesequelaeofsarscov2 AT emmakjones comparativeanalysisoflargelanguagemodelsversustraditionalinformationextractionmethodsforrealworldevidenceofpatientsymptomatologyinacuteandpostacutesequelaeofsarscov2 AT samanthaking comparativeanalysisoflargelanguagemodelsversustraditionalinformationextractionmethodsforrealworldevidenceofpatientsymptomatologyinacuteandpostacutesequelaeofsarscov2 AT genevievebmelton comparativeanalysisoflargelanguagemodelsversustraditionalinformationextractionmethodsforrealworldevidenceofpatientsymptomatologyinacuteandpostacutesequelaeofsarscov2 AT ruizhang comparativeanalysisoflargelanguagemodelsversustraditionalinformationextractionmethodsforrealworldevidenceofpatientsymptomatologyinacuteandpostacutesequelaeofsarscov2 AT christopherjtignanelli comparativeanalysisoflargelanguagemodelsversustraditionalinformationextractionmethodsforrealworldevidenceofpatientsymptomatologyinacuteandpostacutesequelaeofsarscov2 |