Summon a demon and bind it: A grounded theory of LLM red teaming.

Engaging in the deliberate generation of abnormal outputs from Large Language Models (LLMs) by attacking them is a novel human activity. This paper presents a thorough exposition of how and why people perform such attacks, defining LLM red-teaming based on extensive and diverse evidence. Using a for...

Full description

Saved in:

Bibliographic Details
Main Authors:	Nanna Inie, Jonathan Stray, Leon Derczynski
Format:	Article
Language:	English
Published:	Public Library of Science (PLoS) 2025-01-01
Series:	PLoS ONE
Online Access:	https://doi.org/10.1371/journal.pone.0314658
Tags:	Add Tag No Tags, Be the first to tag this record!

Internet

https://doi.org/10.1371/journal.pone.0314658

Summon a demon and bind it: A grounded theory of LLM red teaming.

Internet

Similar Items