Dataset on cognitive distortions for text classification in Indonesian languageMendeley Data.
The collected dataset is a novel collection of Indonesian-language sentences annotated for cognitive distortions. Cognitive distortion is generally a systematic bias in information processing that reinforce negative thinking and contribute to depression. The data was collected via a questionnaire ad...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Elsevier
2025-08-01
|
| Series: | Data in Brief |
| Subjects: | |
| Online Access: | http://www.sciencedirect.com/science/article/pii/S2352340925005633 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849700225975844864 |
|---|---|
| author | Nyoman Putra Sastra Linawati Gede Sukadarmika I.P.G.H. Suputra Ni Made Ariwilani Ni Luh Indah Desira Swandi |
| author_facet | Nyoman Putra Sastra Linawati Gede Sukadarmika I.P.G.H. Suputra Ni Made Ariwilani Ni Luh Indah Desira Swandi |
| author_sort | Nyoman Putra Sastra |
| collection | DOAJ |
| description | The collected dataset is a novel collection of Indonesian-language sentences annotated for cognitive distortions. Cognitive distortion is generally a systematic bias in information processing that reinforce negative thinking and contribute to depression. The data was collected via a questionnaire administered to Indonesian participants aged 18 and above, capturing demographic information (anonymized), history of engagement with mental health professionals, and responses to 10 open-ended life-related questions. Two licensed psychologists manually annotated each sentence, first identifying the presence of cognitive distortions and then categorizing them into one of 10 predefined distortion types. The dataset comprises 4,662 labeled sentences, including 2,246 non-distorted and 2,416 distorted instances. To address the relatively small number of samples in certain distortion classes, data augmentation was performed using the back-translation method. This process resulted in a final dataset size of 4,992 entries. To our knowledge, this is the first Indonesian text classification dataset in the mental health domain, specifically targeting cognitive distortions. This resource is valuable for natural language processing (NLP) research, particularly in text classification tasks, and may also support computational psychology studies. The dataset provides a foundation for developing NLP tools to detect cognitive distortions in low-resource languages and contributes to mental health research in Indonesia. |
| format | Article |
| id | doaj-art-ee506994fe68425aa8f3dd8bfc89feaf |
| institution | DOAJ |
| issn | 2352-3409 |
| language | English |
| publishDate | 2025-08-01 |
| publisher | Elsevier |
| record_format | Article |
| series | Data in Brief |
| spelling | doaj-art-ee506994fe68425aa8f3dd8bfc89feaf2025-08-20T03:18:20ZengElsevierData in Brief2352-34092025-08-016111183610.1016/j.dib.2025.111836Dataset on cognitive distortions for text classification in Indonesian languageMendeley Data.Nyoman Putra Sastra0 Linawati1Gede Sukadarmika2I.P.G.H. Suputra3Ni Made Ariwilani4Ni Luh Indah Desira Swandi5Department of Doctoral Engineering, Faculty of Engineering, University of Udayana, Denpasar 80113, IndonesiaDepartment of Doctoral Engineering, Faculty of Engineering, University of Udayana, Denpasar 80113, IndonesiaDepartment of Doctoral Engineering, Faculty of Engineering, University of Udayana, Denpasar 80113, IndonesiaDepartment of Informatics, Faculty of Math and Natural Science, University of Udayana, Badung 80361, Indonesia; Corresponding author.Department of Psychology, Faculty of Medicine, University of Udayana, Denpasar 80113, IndonesiaDepartment of Psychology, Faculty of Medicine, University of Udayana, Denpasar 80113, IndonesiaThe collected dataset is a novel collection of Indonesian-language sentences annotated for cognitive distortions. Cognitive distortion is generally a systematic bias in information processing that reinforce negative thinking and contribute to depression. The data was collected via a questionnaire administered to Indonesian participants aged 18 and above, capturing demographic information (anonymized), history of engagement with mental health professionals, and responses to 10 open-ended life-related questions. Two licensed psychologists manually annotated each sentence, first identifying the presence of cognitive distortions and then categorizing them into one of 10 predefined distortion types. The dataset comprises 4,662 labeled sentences, including 2,246 non-distorted and 2,416 distorted instances. To address the relatively small number of samples in certain distortion classes, data augmentation was performed using the back-translation method. This process resulted in a final dataset size of 4,992 entries. To our knowledge, this is the first Indonesian text classification dataset in the mental health domain, specifically targeting cognitive distortions. This resource is valuable for natural language processing (NLP) research, particularly in text classification tasks, and may also support computational psychology studies. The dataset provides a foundation for developing NLP tools to detect cognitive distortions in low-resource languages and contributes to mental health research in Indonesia.http://www.sciencedirect.com/science/article/pii/S2352340925005633Cognitive distortionSentences classificationMental healthNatural language processingDistorted sentences |
| spellingShingle | Nyoman Putra Sastra Linawati Gede Sukadarmika I.P.G.H. Suputra Ni Made Ariwilani Ni Luh Indah Desira Swandi Dataset on cognitive distortions for text classification in Indonesian languageMendeley Data. Data in Brief Cognitive distortion Sentences classification Mental health Natural language processing Distorted sentences |
| title | Dataset on cognitive distortions for text classification in Indonesian languageMendeley Data. |
| title_full | Dataset on cognitive distortions for text classification in Indonesian languageMendeley Data. |
| title_fullStr | Dataset on cognitive distortions for text classification in Indonesian languageMendeley Data. |
| title_full_unstemmed | Dataset on cognitive distortions for text classification in Indonesian languageMendeley Data. |
| title_short | Dataset on cognitive distortions for text classification in Indonesian languageMendeley Data. |
| title_sort | dataset on cognitive distortions for text classification in indonesian languagemendeley data |
| topic | Cognitive distortion Sentences classification Mental health Natural language processing Distorted sentences |
| url | http://www.sciencedirect.com/science/article/pii/S2352340925005633 |
| work_keys_str_mv | AT nyomanputrasastra datasetoncognitivedistortionsfortextclassificationinindonesianlanguagemendeleydata AT linawati datasetoncognitivedistortionsfortextclassificationinindonesianlanguagemendeleydata AT gedesukadarmika datasetoncognitivedistortionsfortextclassificationinindonesianlanguagemendeleydata AT ipghsuputra datasetoncognitivedistortionsfortextclassificationinindonesianlanguagemendeleydata AT nimadeariwilani datasetoncognitivedistortionsfortextclassificationinindonesianlanguagemendeleydata AT niluhindahdesiraswandi datasetoncognitivedistortionsfortextclassificationinindonesianlanguagemendeleydata |