Summon a demon and bind it: A grounded theory of LLM red teaming.

Engaging in the deliberate generation of abnormal outputs from Large Language Models (LLMs) by attacking them is a novel human activity. This paper presents a thorough exposition of how and why people perform such attacks, defining LLM red-teaming based on extensive and diverse evidence. Using a for...

Full description

Saved in:
Bibliographic Details
Main Authors: Nanna Inie, Jonathan Stray, Leon Derczynski
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2025-01-01
Series:PLoS ONE
Online Access:https://doi.org/10.1371/journal.pone.0314658
Tags: Add Tag
No Tags, Be the first to tag this record!