LLM Abuse Prevention Tool Using GCG Jailbreak Attack Detection and DistilBERT-Based Ethics Judgment

In recent years, the misuse of large language models (LLMs) has emerged as a significant issue. This paper focuses on a specific attack method known as the greedy coordinate gradient (GCG) jailbreak attack, which compels LLMs to generate responses beyond ethical boundaries. We have developed a tool...

Full description

Saved in:
Bibliographic Details
Main Authors: Qiuyu Chen, Shingo Yamaguchi, Yudai Yamamoto
Format: Article
Language:English
Published: MDPI AG 2025-03-01
Series:Information
Subjects:
Online Access:https://www.mdpi.com/2078-2489/16/3/204
Tags: Add Tag
No Tags, Be the first to tag this record!