Dataset on cognitive distortions for text classification in Indonesian languageMendeley Data.

The collected dataset is a novel collection of Indonesian-language sentences annotated for cognitive distortions. Cognitive distortion is generally a systematic bias in information processing that reinforce negative thinking and contribute to depression. The data was collected via a questionnaire ad...

Full description

Saved in:

Bibliographic Details
Main Authors:	Nyoman Putra Sastra, Linawati, Gede Sukadarmika, I.P.G.H. Suputra, Ni Made Ariwilani, Ni Luh Indah Desira Swandi
Format:	Article
Language:	English
Published:	Elsevier 2025-08-01
Series:	Data in Brief
Subjects:	Cognitive distortion Sentences classification Mental health Natural language processing Distorted sentences
Online Access:	http://www.sciencedirect.com/science/article/pii/S2352340925005633
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849700225975844864
author	Nyoman Putra Sastra Linawati Gede Sukadarmika I.P.G.H. Suputra Ni Made Ariwilani Ni Luh Indah Desira Swandi
author_facet	Nyoman Putra Sastra Linawati Gede Sukadarmika I.P.G.H. Suputra Ni Made Ariwilani Ni Luh Indah Desira Swandi
author_sort	Nyoman Putra Sastra
collection	DOAJ
description	The collected dataset is a novel collection of Indonesian-language sentences annotated for cognitive distortions. Cognitive distortion is generally a systematic bias in information processing that reinforce negative thinking and contribute to depression. The data was collected via a questionnaire administered to Indonesian participants aged 18 and above, capturing demographic information (anonymized), history of engagement with mental health professionals, and responses to 10 open-ended life-related questions. Two licensed psychologists manually annotated each sentence, first identifying the presence of cognitive distortions and then categorizing them into one of 10 predefined distortion types. The dataset comprises 4,662 labeled sentences, including 2,246 non-distorted and 2,416 distorted instances. To address the relatively small number of samples in certain distortion classes, data augmentation was performed using the back-translation method. This process resulted in a final dataset size of 4,992 entries. To our knowledge, this is the first Indonesian text classification dataset in the mental health domain, specifically targeting cognitive distortions. This resource is valuable for natural language processing (NLP) research, particularly in text classification tasks, and may also support computational psychology studies. The dataset provides a foundation for developing NLP tools to detect cognitive distortions in low-resource languages and contributes to mental health research in Indonesia.
format	Article
id	doaj-art-ee506994fe68425aa8f3dd8bfc89feaf
institution	DOAJ
issn	2352-3409
language	English
publishDate	2025-08-01
publisher	Elsevier
record_format	Article
series	Data in Brief
spelling	doaj-art-ee506994fe68425aa8f3dd8bfc89feaf2025-08-20T03:18:20ZengElsevierData in Brief2352-34092025-08-016111183610.1016/j.dib.2025.111836Dataset on cognitive distortions for text classification in Indonesian languageMendeley Data.Nyoman Putra Sastra0 Linawati1Gede Sukadarmika2I.P.G.H. Suputra3Ni Made Ariwilani4Ni Luh Indah Desira Swandi5Department of Doctoral Engineering, Faculty of Engineering, University of Udayana, Denpasar 80113, IndonesiaDepartment of Doctoral Engineering, Faculty of Engineering, University of Udayana, Denpasar 80113, IndonesiaDepartment of Doctoral Engineering, Faculty of Engineering, University of Udayana, Denpasar 80113, IndonesiaDepartment of Informatics, Faculty of Math and Natural Science, University of Udayana, Badung 80361, Indonesia; Corresponding author.Department of Psychology, Faculty of Medicine, University of Udayana, Denpasar 80113, IndonesiaDepartment of Psychology, Faculty of Medicine, University of Udayana, Denpasar 80113, IndonesiaThe collected dataset is a novel collection of Indonesian-language sentences annotated for cognitive distortions. Cognitive distortion is generally a systematic bias in information processing that reinforce negative thinking and contribute to depression. The data was collected via a questionnaire administered to Indonesian participants aged 18 and above, capturing demographic information (anonymized), history of engagement with mental health professionals, and responses to 10 open-ended life-related questions. Two licensed psychologists manually annotated each sentence, first identifying the presence of cognitive distortions and then categorizing them into one of 10 predefined distortion types. The dataset comprises 4,662 labeled sentences, including 2,246 non-distorted and 2,416 distorted instances. To address the relatively small number of samples in certain distortion classes, data augmentation was performed using the back-translation method. This process resulted in a final dataset size of 4,992 entries. To our knowledge, this is the first Indonesian text classification dataset in the mental health domain, specifically targeting cognitive distortions. This resource is valuable for natural language processing (NLP) research, particularly in text classification tasks, and may also support computational psychology studies. The dataset provides a foundation for developing NLP tools to detect cognitive distortions in low-resource languages and contributes to mental health research in Indonesia.http://www.sciencedirect.com/science/article/pii/S2352340925005633Cognitive distortionSentences classificationMental healthNatural language processingDistorted sentences
spellingShingle	Nyoman Putra Sastra Linawati Gede Sukadarmika I.P.G.H. Suputra Ni Made Ariwilani Ni Luh Indah Desira Swandi Dataset on cognitive distortions for text classification in Indonesian languageMendeley Data. Data in Brief Cognitive distortion Sentences classification Mental health Natural language processing Distorted sentences
title	Dataset on cognitive distortions for text classification in Indonesian languageMendeley Data.
title_full	Dataset on cognitive distortions for text classification in Indonesian languageMendeley Data.
title_fullStr	Dataset on cognitive distortions for text classification in Indonesian languageMendeley Data.
title_full_unstemmed	Dataset on cognitive distortions for text classification in Indonesian languageMendeley Data.
title_short	Dataset on cognitive distortions for text classification in Indonesian languageMendeley Data.
title_sort	dataset on cognitive distortions for text classification in indonesian languagemendeley data
topic	Cognitive distortion Sentences classification Mental health Natural language processing Distorted sentences
url	http://www.sciencedirect.com/science/article/pii/S2352340925005633
work_keys_str_mv	AT nyomanputrasastra datasetoncognitivedistortionsfortextclassificationinindonesianlanguagemendeleydata AT linawati datasetoncognitivedistortionsfortextclassificationinindonesianlanguagemendeleydata AT gedesukadarmika datasetoncognitivedistortionsfortextclassificationinindonesianlanguagemendeleydata AT ipghsuputra datasetoncognitivedistortionsfortextclassificationinindonesianlanguagemendeleydata AT nimadeariwilani datasetoncognitivedistortionsfortextclassificationinindonesianlanguagemendeleydata AT niluhindahdesiraswandi datasetoncognitivedistortionsfortextclassificationinindonesianlanguagemendeleydata

Dataset on cognitive distortions for text classification in Indonesian languageMendeley Data.

Similar Items