Prompt injection attacks on vision language models in oncology
Abstract Vision-language artificial intelligence models (VLMs) possess medical knowledge and can be employed in healthcare in numerous ways, including as image interpreters, virtual scribes, and general decision support systems. However, here, we demonstrate that current VLMs applied to medical task...
Saved in:
Main Authors: | , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Nature Portfolio
2025-02-01
|
Series: | Nature Communications |
Online Access: | https://doi.org/10.1038/s41467-024-55631-x |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832571552327008256 |
---|---|
author | Jan Clusmann Dyke Ferber Isabella C. Wiest Carolin V. Schneider Titus J. Brinker Sebastian Foersch Daniel Truhn Jakob Nikolas Kather |
author_facet | Jan Clusmann Dyke Ferber Isabella C. Wiest Carolin V. Schneider Titus J. Brinker Sebastian Foersch Daniel Truhn Jakob Nikolas Kather |
author_sort | Jan Clusmann |
collection | DOAJ |
description | Abstract Vision-language artificial intelligence models (VLMs) possess medical knowledge and can be employed in healthcare in numerous ways, including as image interpreters, virtual scribes, and general decision support systems. However, here, we demonstrate that current VLMs applied to medical tasks exhibit a fundamental security flaw: they can be compromised by prompt injection attacks. These can be used to output harmful information just by interacting with the VLM, without any access to its parameters. We perform a quantitative study to evaluate the vulnerabilities to these attacks in four state of the art VLMs: Claude-3 Opus, Claude-3.5 Sonnet, Reka Core, and GPT-4o. Using a set of N = 594 attacks, we show that all of these models are susceptible. Specifically, we show that embedding sub-visual prompts in manifold medical imaging data can cause the model to provide harmful output, and that these prompts are non-obvious to human observers. Thus, our study demonstrates a key vulnerability in medical VLMs which should be mitigated before widespread clinical adoption. |
format | Article |
id | doaj-art-e680f8ab6d8c40a6a6d8dcb8944b6ff6 |
institution | Kabale University |
issn | 2041-1723 |
language | English |
publishDate | 2025-02-01 |
publisher | Nature Portfolio |
record_format | Article |
series | Nature Communications |
spelling | doaj-art-e680f8ab6d8c40a6a6d8dcb8944b6ff62025-02-02T12:31:19ZengNature PortfolioNature Communications2041-17232025-02-011611910.1038/s41467-024-55631-xPrompt injection attacks on vision language models in oncologyJan Clusmann0Dyke Ferber1Isabella C. Wiest2Carolin V. Schneider3Titus J. Brinker4Sebastian Foersch5Daniel Truhn6Jakob Nikolas Kather7Else Kroener Fresenius Center for Digital Health, Technical University DresdenElse Kroener Fresenius Center for Digital Health, Technical University DresdenElse Kroener Fresenius Center for Digital Health, Technical University DresdenElse Kroener Fresenius Center for Digital Health, Technical University DresdenDigital Biomarkers for Oncology Group, German Cancer Research CenterInstitute of Pathology, University Medical Center MainzDepartment of Diagnostic and Interventional Radiology, University Hospital AachenElse Kroener Fresenius Center for Digital Health, Technical University DresdenAbstract Vision-language artificial intelligence models (VLMs) possess medical knowledge and can be employed in healthcare in numerous ways, including as image interpreters, virtual scribes, and general decision support systems. However, here, we demonstrate that current VLMs applied to medical tasks exhibit a fundamental security flaw: they can be compromised by prompt injection attacks. These can be used to output harmful information just by interacting with the VLM, without any access to its parameters. We perform a quantitative study to evaluate the vulnerabilities to these attacks in four state of the art VLMs: Claude-3 Opus, Claude-3.5 Sonnet, Reka Core, and GPT-4o. Using a set of N = 594 attacks, we show that all of these models are susceptible. Specifically, we show that embedding sub-visual prompts in manifold medical imaging data can cause the model to provide harmful output, and that these prompts are non-obvious to human observers. Thus, our study demonstrates a key vulnerability in medical VLMs which should be mitigated before widespread clinical adoption.https://doi.org/10.1038/s41467-024-55631-x |
spellingShingle | Jan Clusmann Dyke Ferber Isabella C. Wiest Carolin V. Schneider Titus J. Brinker Sebastian Foersch Daniel Truhn Jakob Nikolas Kather Prompt injection attacks on vision language models in oncology Nature Communications |
title | Prompt injection attacks on vision language models in oncology |
title_full | Prompt injection attacks on vision language models in oncology |
title_fullStr | Prompt injection attacks on vision language models in oncology |
title_full_unstemmed | Prompt injection attacks on vision language models in oncology |
title_short | Prompt injection attacks on vision language models in oncology |
title_sort | prompt injection attacks on vision language models in oncology |
url | https://doi.org/10.1038/s41467-024-55631-x |
work_keys_str_mv | AT janclusmann promptinjectionattacksonvisionlanguagemodelsinoncology AT dykeferber promptinjectionattacksonvisionlanguagemodelsinoncology AT isabellacwiest promptinjectionattacksonvisionlanguagemodelsinoncology AT carolinvschneider promptinjectionattacksonvisionlanguagemodelsinoncology AT titusjbrinker promptinjectionattacksonvisionlanguagemodelsinoncology AT sebastianfoersch promptinjectionattacksonvisionlanguagemodelsinoncology AT danieltruhn promptinjectionattacksonvisionlanguagemodelsinoncology AT jakobnikolaskather promptinjectionattacksonvisionlanguagemodelsinoncology |