Using large language models for enhanced fraud analysis and detection in blockchain based health insurance claims

Abstract Traditional health insurance claim processing systems are plagued by inefficiencies and vulnerabilities, often resulting in significant financial losses due to fraudulent activities. Existing fraud detection methods are largely manual, time-consuming, and inadequate for handling the complex...

Full description

Saved in:
Bibliographic Details
Main Authors: Ruba Islayem, Senay Gebreab, Walaa AlKhader, Ahmad Musamih, Khaled Salah, Raja Jayaraman, Muhammad Khurram Khan
Format: Article
Language:English
Published: Nature Portfolio 2025-08-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-025-15676-4
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Traditional health insurance claim processing systems are plagued by inefficiencies and vulnerabilities, often resulting in significant financial losses due to fraudulent activities. Existing fraud detection methods are largely manual, time-consuming, and inadequate for handling the complexity and scale of modern fraudulent schemes. Moreover, the trust-based relationships between insurers and healthcare providers lack mechanisms to ensure data integrity and prevent manipulation. While several blockchain-based systems have been proposed to improve transparency and tamper resistance, they typically focus on structured data and predefined fraud types, offering limited adaptability and analytical insight. This paper proposes a novel solution leveraging blockchain technology and Large Language Models (LLMs) to transform fraud detection. The system uses Ethereum smart contracts (SCs) to securely store medical records and claim details on a decentralized, tamper-proof ledger that ensures data integrity, traceability, and accountability. This immutable data is accessed by an LLM via a Retrieval-Augmented Generation (RAG) system, which enables intelligent retrieval and analysis of relevant clinical information to detect fraud patterns and inconsistencies. To support complex scenarios involving free-text documents, unstructured clinical data, such as lab reports, are stored using decentralized off-chain storage and retrieved during LLM analysis. In addition, an LLM-powered chatbot also allows insurance providers to interact with the system in natural language for claim inquiries, explanations, and summaries. The architecture, sequence diagrams, and implementation algorithms outline the development process, while testing scenarios demonstrate the system’s ability to detect fraud such as inflated costs, unnecessary treatments, and unrendered services. Evaluation using both synthetic and public clinical datasets showed strong performance, with the LLM achieving up to 99% fraud detection accuracy. Cost, security, and scalability analyses confirm the system’s practicality and resilience, with the complete detection process executing in just 13 seconds. By overcoming the limitations of traditional systems, this framework offers a scalable and adaptable approach for healthcare and other domains. The SCs and source code are publicly available on GitHub.
ISSN:2045-2322