Prompt injection attacks on vision language models in oncology

Abstract Vision-language artificial intelligence models (VLMs) possess medical knowledge and can be employed in healthcare in numerous ways, including as image interpreters, virtual scribes, and general decision support systems. However, here, we demonstrate that current VLMs applied to medical task...

Full description

Saved in:
Bibliographic Details
Main Authors: Jan Clusmann, Dyke Ferber, Isabella C. Wiest, Carolin V. Schneider, Titus J. Brinker, Sebastian Foersch, Daniel Truhn, Jakob Nikolas Kather
Format: Article
Language:English
Published: Nature Portfolio 2025-02-01
Series:Nature Communications
Online Access:https://doi.org/10.1038/s41467-024-55631-x
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832571552327008256
author Jan Clusmann
Dyke Ferber
Isabella C. Wiest
Carolin V. Schneider
Titus J. Brinker
Sebastian Foersch
Daniel Truhn
Jakob Nikolas Kather
author_facet Jan Clusmann
Dyke Ferber
Isabella C. Wiest
Carolin V. Schneider
Titus J. Brinker
Sebastian Foersch
Daniel Truhn
Jakob Nikolas Kather
author_sort Jan Clusmann
collection DOAJ
description Abstract Vision-language artificial intelligence models (VLMs) possess medical knowledge and can be employed in healthcare in numerous ways, including as image interpreters, virtual scribes, and general decision support systems. However, here, we demonstrate that current VLMs applied to medical tasks exhibit a fundamental security flaw: they can be compromised by prompt injection attacks. These can be used to output harmful information just by interacting with the VLM, without any access to its parameters. We perform a quantitative study to evaluate the vulnerabilities to these attacks in four state of the art VLMs: Claude-3 Opus, Claude-3.5 Sonnet, Reka Core, and GPT-4o. Using a set of N = 594 attacks, we show that all of these models are susceptible. Specifically, we show that embedding sub-visual prompts in manifold medical imaging data can cause the model to provide harmful output, and that these prompts are non-obvious to human observers. Thus, our study demonstrates a key vulnerability in medical VLMs which should be mitigated before widespread clinical adoption.
format Article
id doaj-art-e680f8ab6d8c40a6a6d8dcb8944b6ff6
institution Kabale University
issn 2041-1723
language English
publishDate 2025-02-01
publisher Nature Portfolio
record_format Article
series Nature Communications
spelling doaj-art-e680f8ab6d8c40a6a6d8dcb8944b6ff62025-02-02T12:31:19ZengNature PortfolioNature Communications2041-17232025-02-011611910.1038/s41467-024-55631-xPrompt injection attacks on vision language models in oncologyJan Clusmann0Dyke Ferber1Isabella C. Wiest2Carolin V. Schneider3Titus J. Brinker4Sebastian Foersch5Daniel Truhn6Jakob Nikolas Kather7Else Kroener Fresenius Center for Digital Health, Technical University DresdenElse Kroener Fresenius Center for Digital Health, Technical University DresdenElse Kroener Fresenius Center for Digital Health, Technical University DresdenElse Kroener Fresenius Center for Digital Health, Technical University DresdenDigital Biomarkers for Oncology Group, German Cancer Research CenterInstitute of Pathology, University Medical Center MainzDepartment of Diagnostic and Interventional Radiology, University Hospital AachenElse Kroener Fresenius Center for Digital Health, Technical University DresdenAbstract Vision-language artificial intelligence models (VLMs) possess medical knowledge and can be employed in healthcare in numerous ways, including as image interpreters, virtual scribes, and general decision support systems. However, here, we demonstrate that current VLMs applied to medical tasks exhibit a fundamental security flaw: they can be compromised by prompt injection attacks. These can be used to output harmful information just by interacting with the VLM, without any access to its parameters. We perform a quantitative study to evaluate the vulnerabilities to these attacks in four state of the art VLMs: Claude-3 Opus, Claude-3.5 Sonnet, Reka Core, and GPT-4o. Using a set of N = 594 attacks, we show that all of these models are susceptible. Specifically, we show that embedding sub-visual prompts in manifold medical imaging data can cause the model to provide harmful output, and that these prompts are non-obvious to human observers. Thus, our study demonstrates a key vulnerability in medical VLMs which should be mitigated before widespread clinical adoption.https://doi.org/10.1038/s41467-024-55631-x
spellingShingle Jan Clusmann
Dyke Ferber
Isabella C. Wiest
Carolin V. Schneider
Titus J. Brinker
Sebastian Foersch
Daniel Truhn
Jakob Nikolas Kather
Prompt injection attacks on vision language models in oncology
Nature Communications
title Prompt injection attacks on vision language models in oncology
title_full Prompt injection attacks on vision language models in oncology
title_fullStr Prompt injection attacks on vision language models in oncology
title_full_unstemmed Prompt injection attacks on vision language models in oncology
title_short Prompt injection attacks on vision language models in oncology
title_sort prompt injection attacks on vision language models in oncology
url https://doi.org/10.1038/s41467-024-55631-x
work_keys_str_mv AT janclusmann promptinjectionattacksonvisionlanguagemodelsinoncology
AT dykeferber promptinjectionattacksonvisionlanguagemodelsinoncology
AT isabellacwiest promptinjectionattacksonvisionlanguagemodelsinoncology
AT carolinvschneider promptinjectionattacksonvisionlanguagemodelsinoncology
AT titusjbrinker promptinjectionattacksonvisionlanguagemodelsinoncology
AT sebastianfoersch promptinjectionattacksonvisionlanguagemodelsinoncology
AT danieltruhn promptinjectionattacksonvisionlanguagemodelsinoncology
AT jakobnikolaskather promptinjectionattacksonvisionlanguagemodelsinoncology