LLM Abuse Prevention Tool Using GCG Jailbreak Attack Detection and DistilBERT-Based Ethics Judgment

In recent years, the misuse of large language models (LLMs) has emerged as a significant issue. This paper focuses on a specific attack method known as the greedy coordinate gradient (GCG) jailbreak attack, which compels LLMs to generate responses beyond ethical boundaries. We have developed a tool...

Full description

Saved in:

Bibliographic Details
Main Authors:	Qiuyu Chen, Shingo Yamaguchi, Yudai Yamamoto
Format:	Article
Language:	English
Published:	MDPI AG 2025-03-01
Series:	Information
Subjects:	LLMs jailbreak attack GCG syntactic trees perplexity SLMs
Online Access:	https://www.mdpi.com/2078-2489/16/3/204
Tags:	Add Tag No Tags, Be the first to tag this record!

Internet

https://www.mdpi.com/2078-2489/16/3/204

LLM Abuse Prevention Tool Using GCG Jailbreak Attack Detection and DistilBERT-Based Ethics Judgment

Internet

Similar Items