SeisAug: A data augmentation python toolkit

A common limitation in applying any deep learning and machine learning techniques is the limited labelled dataset which can be addressed through Data augmentation (DA). SeisAug is a DA python toolkit to address this challenge in seismological studies. DA. DA helps to balance the imbalanced classes o...

Full description

Saved in:
Bibliographic Details
Main Authors: D. Pragnath, G. Srijayanthi, Santosh Kumar, Sumer Chopra
Format: Article
Language:English
Published: Elsevier 2025-02-01
Series:Applied Computing and Geosciences
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S259019742500014X
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849392022798991360
author D. Pragnath
G. Srijayanthi
Santosh Kumar
Sumer Chopra
author_facet D. Pragnath
G. Srijayanthi
Santosh Kumar
Sumer Chopra
author_sort D. Pragnath
collection DOAJ
description A common limitation in applying any deep learning and machine learning techniques is the limited labelled dataset which can be addressed through Data augmentation (DA). SeisAug is a DA python toolkit to address this challenge in seismological studies. DA. DA helps to balance the imbalanced classes of a dataset by creating more examples of under-represented classes. It significantly mitigates overfitting by increasing the volume of training data and introducing variability, thereby improving the model's performance on unseen data. Given the rapid advancements in deep learning for seismology, ‘SeisAug’ assists in extensibility by generating a substantial amount of data (2–6 times more data) which can aid in developing an indigenous robust model. Further, this study demonstrates the role of DA in developing a robust model. For this we utilized a basic two class identification models between earthquake/signal and noise/(non-earthquake). The model is trained with original, 1 and 5 times augmented datasets and their performance metrics are evaluated. The model trained with 5X times augmented dataset significantly outperforms with accuracy of 0.991, AUC 0.999 and AUC-PR 0.999 compared to the model trained with original dataset with accuracy of 0.50, AUC 0.75 and AUC-PR 0.80. Furthermore, by making all codes available on GitHub, the toolkit facilitates the easy application of DA techniques, empowering end-users to enhance their seismological waveform datasets effectively and overcome the initial drawbacks posed by the scarcity of labelled data.
format Article
id doaj-art-e9802e9dc3fe4700a390abbc19e986f7
institution Kabale University
issn 2590-1974
language English
publishDate 2025-02-01
publisher Elsevier
record_format Article
series Applied Computing and Geosciences
spelling doaj-art-e9802e9dc3fe4700a390abbc19e986f72025-08-20T03:40:51ZengElsevierApplied Computing and Geosciences2590-19742025-02-012510023210.1016/j.acags.2025.100232SeisAug: A data augmentation python toolkitD. Pragnath0G. Srijayanthi1Santosh Kumar2Sumer Chopra3Institute of Seismological Research, Gandhinagar, India; Gujarat University, Ahmedabad, IndiaInstitute of Seismological Research, Gandhinagar, India; Corresponding author.Institute of Seismological Research, Gandhinagar, IndiaInstitute of Seismological Research, Gandhinagar, IndiaA common limitation in applying any deep learning and machine learning techniques is the limited labelled dataset which can be addressed through Data augmentation (DA). SeisAug is a DA python toolkit to address this challenge in seismological studies. DA. DA helps to balance the imbalanced classes of a dataset by creating more examples of under-represented classes. It significantly mitigates overfitting by increasing the volume of training data and introducing variability, thereby improving the model's performance on unseen data. Given the rapid advancements in deep learning for seismology, ‘SeisAug’ assists in extensibility by generating a substantial amount of data (2–6 times more data) which can aid in developing an indigenous robust model. Further, this study demonstrates the role of DA in developing a robust model. For this we utilized a basic two class identification models between earthquake/signal and noise/(non-earthquake). The model is trained with original, 1 and 5 times augmented datasets and their performance metrics are evaluated. The model trained with 5X times augmented dataset significantly outperforms with accuracy of 0.991, AUC 0.999 and AUC-PR 0.999 compared to the model trained with original dataset with accuracy of 0.50, AUC 0.75 and AUC-PR 0.80. Furthermore, by making all codes available on GitHub, the toolkit facilitates the easy application of DA techniques, empowering end-users to enhance their seismological waveform datasets effectively and overcome the initial drawbacks posed by the scarcity of labelled data.http://www.sciencedirect.com/science/article/pii/S259019742500014XDeep learningAugmentationSeismic signalsEarthquakesSpectrumFilters
spellingShingle D. Pragnath
G. Srijayanthi
Santosh Kumar
Sumer Chopra
SeisAug: A data augmentation python toolkit
Applied Computing and Geosciences
Deep learning
Augmentation
Seismic signals
Earthquakes
Spectrum
Filters
title SeisAug: A data augmentation python toolkit
title_full SeisAug: A data augmentation python toolkit
title_fullStr SeisAug: A data augmentation python toolkit
title_full_unstemmed SeisAug: A data augmentation python toolkit
title_short SeisAug: A data augmentation python toolkit
title_sort seisaug a data augmentation python toolkit
topic Deep learning
Augmentation
Seismic signals
Earthquakes
Spectrum
Filters
url http://www.sciencedirect.com/science/article/pii/S259019742500014X
work_keys_str_mv AT dpragnath seisaugadataaugmentationpythontoolkit
AT gsrijayanthi seisaugadataaugmentationpythontoolkit
AT santoshkumar seisaugadataaugmentationpythontoolkit
AT sumerchopra seisaugadataaugmentationpythontoolkit