SeisAug: A data augmentation python toolkit
A common limitation in applying any deep learning and machine learning techniques is the limited labelled dataset which can be addressed through Data augmentation (DA). SeisAug is a DA python toolkit to address this challenge in seismological studies. DA. DA helps to balance the imbalanced classes o...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Elsevier
2025-02-01
|
| Series: | Applied Computing and Geosciences |
| Subjects: | |
| Online Access: | http://www.sciencedirect.com/science/article/pii/S259019742500014X |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849392022798991360 |
|---|---|
| author | D. Pragnath G. Srijayanthi Santosh Kumar Sumer Chopra |
| author_facet | D. Pragnath G. Srijayanthi Santosh Kumar Sumer Chopra |
| author_sort | D. Pragnath |
| collection | DOAJ |
| description | A common limitation in applying any deep learning and machine learning techniques is the limited labelled dataset which can be addressed through Data augmentation (DA). SeisAug is a DA python toolkit to address this challenge in seismological studies. DA. DA helps to balance the imbalanced classes of a dataset by creating more examples of under-represented classes. It significantly mitigates overfitting by increasing the volume of training data and introducing variability, thereby improving the model's performance on unseen data. Given the rapid advancements in deep learning for seismology, ‘SeisAug’ assists in extensibility by generating a substantial amount of data (2–6 times more data) which can aid in developing an indigenous robust model. Further, this study demonstrates the role of DA in developing a robust model. For this we utilized a basic two class identification models between earthquake/signal and noise/(non-earthquake). The model is trained with original, 1 and 5 times augmented datasets and their performance metrics are evaluated. The model trained with 5X times augmented dataset significantly outperforms with accuracy of 0.991, AUC 0.999 and AUC-PR 0.999 compared to the model trained with original dataset with accuracy of 0.50, AUC 0.75 and AUC-PR 0.80. Furthermore, by making all codes available on GitHub, the toolkit facilitates the easy application of DA techniques, empowering end-users to enhance their seismological waveform datasets effectively and overcome the initial drawbacks posed by the scarcity of labelled data. |
| format | Article |
| id | doaj-art-e9802e9dc3fe4700a390abbc19e986f7 |
| institution | Kabale University |
| issn | 2590-1974 |
| language | English |
| publishDate | 2025-02-01 |
| publisher | Elsevier |
| record_format | Article |
| series | Applied Computing and Geosciences |
| spelling | doaj-art-e9802e9dc3fe4700a390abbc19e986f72025-08-20T03:40:51ZengElsevierApplied Computing and Geosciences2590-19742025-02-012510023210.1016/j.acags.2025.100232SeisAug: A data augmentation python toolkitD. Pragnath0G. Srijayanthi1Santosh Kumar2Sumer Chopra3Institute of Seismological Research, Gandhinagar, India; Gujarat University, Ahmedabad, IndiaInstitute of Seismological Research, Gandhinagar, India; Corresponding author.Institute of Seismological Research, Gandhinagar, IndiaInstitute of Seismological Research, Gandhinagar, IndiaA common limitation in applying any deep learning and machine learning techniques is the limited labelled dataset which can be addressed through Data augmentation (DA). SeisAug is a DA python toolkit to address this challenge in seismological studies. DA. DA helps to balance the imbalanced classes of a dataset by creating more examples of under-represented classes. It significantly mitigates overfitting by increasing the volume of training data and introducing variability, thereby improving the model's performance on unseen data. Given the rapid advancements in deep learning for seismology, ‘SeisAug’ assists in extensibility by generating a substantial amount of data (2–6 times more data) which can aid in developing an indigenous robust model. Further, this study demonstrates the role of DA in developing a robust model. For this we utilized a basic two class identification models between earthquake/signal and noise/(non-earthquake). The model is trained with original, 1 and 5 times augmented datasets and their performance metrics are evaluated. The model trained with 5X times augmented dataset significantly outperforms with accuracy of 0.991, AUC 0.999 and AUC-PR 0.999 compared to the model trained with original dataset with accuracy of 0.50, AUC 0.75 and AUC-PR 0.80. Furthermore, by making all codes available on GitHub, the toolkit facilitates the easy application of DA techniques, empowering end-users to enhance their seismological waveform datasets effectively and overcome the initial drawbacks posed by the scarcity of labelled data.http://www.sciencedirect.com/science/article/pii/S259019742500014XDeep learningAugmentationSeismic signalsEarthquakesSpectrumFilters |
| spellingShingle | D. Pragnath G. Srijayanthi Santosh Kumar Sumer Chopra SeisAug: A data augmentation python toolkit Applied Computing and Geosciences Deep learning Augmentation Seismic signals Earthquakes Spectrum Filters |
| title | SeisAug: A data augmentation python toolkit |
| title_full | SeisAug: A data augmentation python toolkit |
| title_fullStr | SeisAug: A data augmentation python toolkit |
| title_full_unstemmed | SeisAug: A data augmentation python toolkit |
| title_short | SeisAug: A data augmentation python toolkit |
| title_sort | seisaug a data augmentation python toolkit |
| topic | Deep learning Augmentation Seismic signals Earthquakes Spectrum Filters |
| url | http://www.sciencedirect.com/science/article/pii/S259019742500014X |
| work_keys_str_mv | AT dpragnath seisaugadataaugmentationpythontoolkit AT gsrijayanthi seisaugadataaugmentationpythontoolkit AT santoshkumar seisaugadataaugmentationpythontoolkit AT sumerchopra seisaugadataaugmentationpythontoolkit |