Dataset on cognitive distortions for text classification in Indonesian languageMendeley Data.

The collected dataset is a novel collection of Indonesian-language sentences annotated for cognitive distortions. Cognitive distortion is generally a systematic bias in information processing that reinforce negative thinking and contribute to depression. The data was collected via a questionnaire ad...

Full description

Saved in:

Bibliographic Details
Main Authors:	Nyoman Putra Sastra, Linawati, Gede Sukadarmika, I.P.G.H. Suputra, Ni Made Ariwilani, Ni Luh Indah Desira Swandi
Format:	Article
Language:	English
Published:	Elsevier 2025-08-01
Series:	Data in Brief
Subjects:	Cognitive distortion Sentences classification Mental health Natural language processing Distorted sentences
Online Access:	http://www.sciencedirect.com/science/article/pii/S2352340925005633
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	The collected dataset is a novel collection of Indonesian-language sentences annotated for cognitive distortions. Cognitive distortion is generally a systematic bias in information processing that reinforce negative thinking and contribute to depression. The data was collected via a questionnaire administered to Indonesian participants aged 18 and above, capturing demographic information (anonymized), history of engagement with mental health professionals, and responses to 10 open-ended life-related questions. Two licensed psychologists manually annotated each sentence, first identifying the presence of cognitive distortions and then categorizing them into one of 10 predefined distortion types. The dataset comprises 4,662 labeled sentences, including 2,246 non-distorted and 2,416 distorted instances. To address the relatively small number of samples in certain distortion classes, data augmentation was performed using the back-translation method. This process resulted in a final dataset size of 4,992 entries. To our knowledge, this is the first Indonesian text classification dataset in the mental health domain, specifically targeting cognitive distortions. This resource is valuable for natural language processing (NLP) research, particularly in text classification tasks, and may also support computational psychology studies. The dataset provides a foundation for developing NLP tools to detect cognitive distortions in low-resource languages and contributes to mental health research in Indonesia.
ISSN:	2352-3409

Dataset on cognitive distortions for text classification in Indonesian languageMendeley Data.

Similar Items