Dataset on cognitive distortions for text classification in Indonesian languageMendeley Data.

The collected dataset is a novel collection of Indonesian-language sentences annotated for cognitive distortions. Cognitive distortion is generally a systematic bias in information processing that reinforce negative thinking and contribute to depression. The data was collected via a questionnaire ad...

Full description

Saved in:
Bibliographic Details
Main Authors: Nyoman Putra Sastra, Linawati, Gede Sukadarmika, I.P.G.H. Suputra, Ni Made Ariwilani, Ni Luh Indah Desira Swandi
Format: Article
Language:English
Published: Elsevier 2025-08-01
Series:Data in Brief
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2352340925005633
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849700225975844864
author Nyoman Putra Sastra
Linawati
Gede Sukadarmika
I.P.G.H. Suputra
Ni Made Ariwilani
Ni Luh Indah Desira Swandi
author_facet Nyoman Putra Sastra
Linawati
Gede Sukadarmika
I.P.G.H. Suputra
Ni Made Ariwilani
Ni Luh Indah Desira Swandi
author_sort Nyoman Putra Sastra
collection DOAJ
description The collected dataset is a novel collection of Indonesian-language sentences annotated for cognitive distortions. Cognitive distortion is generally a systematic bias in information processing that reinforce negative thinking and contribute to depression. The data was collected via a questionnaire administered to Indonesian participants aged 18 and above, capturing demographic information (anonymized), history of engagement with mental health professionals, and responses to 10 open-ended life-related questions. Two licensed psychologists manually annotated each sentence, first identifying the presence of cognitive distortions and then categorizing them into one of 10 predefined distortion types. The dataset comprises 4,662 labeled sentences, including 2,246 non-distorted and 2,416 distorted instances. To address the relatively small number of samples in certain distortion classes, data augmentation was performed using the back-translation method. This process resulted in a final dataset size of 4,992 entries. To our knowledge, this is the first Indonesian text classification dataset in the mental health domain, specifically targeting cognitive distortions. This resource is valuable for natural language processing (NLP) research, particularly in text classification tasks, and may also support computational psychology studies. The dataset provides a foundation for developing NLP tools to detect cognitive distortions in low-resource languages and contributes to mental health research in Indonesia.
format Article
id doaj-art-ee506994fe68425aa8f3dd8bfc89feaf
institution DOAJ
issn 2352-3409
language English
publishDate 2025-08-01
publisher Elsevier
record_format Article
series Data in Brief
spelling doaj-art-ee506994fe68425aa8f3dd8bfc89feaf2025-08-20T03:18:20ZengElsevierData in Brief2352-34092025-08-016111183610.1016/j.dib.2025.111836Dataset on cognitive distortions for text classification in Indonesian languageMendeley Data.Nyoman Putra Sastra0 Linawati1Gede Sukadarmika2I.P.G.H. Suputra3Ni Made Ariwilani4Ni Luh Indah Desira Swandi5Department of Doctoral Engineering, Faculty of Engineering, University of Udayana, Denpasar 80113, IndonesiaDepartment of Doctoral Engineering, Faculty of Engineering, University of Udayana, Denpasar 80113, IndonesiaDepartment of Doctoral Engineering, Faculty of Engineering, University of Udayana, Denpasar 80113, IndonesiaDepartment of Informatics, Faculty of Math and Natural Science, University of Udayana, Badung 80361, Indonesia; Corresponding author.Department of Psychology, Faculty of Medicine, University of Udayana, Denpasar 80113, IndonesiaDepartment of Psychology, Faculty of Medicine, University of Udayana, Denpasar 80113, IndonesiaThe collected dataset is a novel collection of Indonesian-language sentences annotated for cognitive distortions. Cognitive distortion is generally a systematic bias in information processing that reinforce negative thinking and contribute to depression. The data was collected via a questionnaire administered to Indonesian participants aged 18 and above, capturing demographic information (anonymized), history of engagement with mental health professionals, and responses to 10 open-ended life-related questions. Two licensed psychologists manually annotated each sentence, first identifying the presence of cognitive distortions and then categorizing them into one of 10 predefined distortion types. The dataset comprises 4,662 labeled sentences, including 2,246 non-distorted and 2,416 distorted instances. To address the relatively small number of samples in certain distortion classes, data augmentation was performed using the back-translation method. This process resulted in a final dataset size of 4,992 entries. To our knowledge, this is the first Indonesian text classification dataset in the mental health domain, specifically targeting cognitive distortions. This resource is valuable for natural language processing (NLP) research, particularly in text classification tasks, and may also support computational psychology studies. The dataset provides a foundation for developing NLP tools to detect cognitive distortions in low-resource languages and contributes to mental health research in Indonesia.http://www.sciencedirect.com/science/article/pii/S2352340925005633Cognitive distortionSentences classificationMental healthNatural language processingDistorted sentences
spellingShingle Nyoman Putra Sastra
Linawati
Gede Sukadarmika
I.P.G.H. Suputra
Ni Made Ariwilani
Ni Luh Indah Desira Swandi
Dataset on cognitive distortions for text classification in Indonesian languageMendeley Data.
Data in Brief
Cognitive distortion
Sentences classification
Mental health
Natural language processing
Distorted sentences
title Dataset on cognitive distortions for text classification in Indonesian languageMendeley Data.
title_full Dataset on cognitive distortions for text classification in Indonesian languageMendeley Data.
title_fullStr Dataset on cognitive distortions for text classification in Indonesian languageMendeley Data.
title_full_unstemmed Dataset on cognitive distortions for text classification in Indonesian languageMendeley Data.
title_short Dataset on cognitive distortions for text classification in Indonesian languageMendeley Data.
title_sort dataset on cognitive distortions for text classification in indonesian languagemendeley data
topic Cognitive distortion
Sentences classification
Mental health
Natural language processing
Distorted sentences
url http://www.sciencedirect.com/science/article/pii/S2352340925005633
work_keys_str_mv AT nyomanputrasastra datasetoncognitivedistortionsfortextclassificationinindonesianlanguagemendeleydata
AT linawati datasetoncognitivedistortionsfortextclassificationinindonesianlanguagemendeleydata
AT gedesukadarmika datasetoncognitivedistortionsfortextclassificationinindonesianlanguagemendeleydata
AT ipghsuputra datasetoncognitivedistortionsfortextclassificationinindonesianlanguagemendeleydata
AT nimadeariwilani datasetoncognitivedistortionsfortextclassificationinindonesianlanguagemendeleydata
AT niluhindahdesiraswandi datasetoncognitivedistortionsfortextclassificationinindonesianlanguagemendeleydata