Bilingual Dialogue Dataset with Personality and Emotion Annotations for Personality Recognition in Education

Abstract Dialogue datasets are essential for advancing natural language processing (NLP) tasks. However, many existing datasets lack integrated annotations for personality and emotion, limiting models’ ability to effectively capture these aspects and generate personalized, human-like dialogues, whic...

Full description

Saved in:
Bibliographic Details
Main Authors: Zhi Liu, Yao Xiao, Zhu Su, Luyao Ye, Kaili Lu, Xian Peng
Format: Article
Language:English
Published: Nature Portfolio 2025-03-01
Series:Scientific Data
Online Access:https://doi.org/10.1038/s41597-025-04836-w
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Dialogue datasets are essential for advancing natural language processing (NLP) tasks. However, many existing datasets lack integrated annotations for personality and emotion, limiting models’ ability to effectively capture these aspects and generate personalized, human-like dialogues, which ultimately impact user experience. To address this challenge, we construct bilingual dialogue datasets in Chinese and English, incorporating Big Five personality traits and emotion annotations. We utilize the AutoGen tool within a multi-agent framework to generate multi-turn question-answering dialogue datasets based on fables. By creating persona agents with diverse personalities, we effectively enhance the heterogeneity of personalities, overcoming previous limitations in personality diversity. Finally, we validate the utterance quality in the dataset and investigate the alignment between conversational utterances and speakers’ personality traits. Moreover, by integrating emotional annotations for each utterance, This dataset offers significant potential for developing emotion-aware systems that automatically detect personality traits. It serves as a valuable resource for advancing emotionally intelligent dialogue systems and research in personality and affective computing.
ISSN:2052-4463