Do more with less: Exploring semi-supervised learning for geological image classification

Labelled datasets within geoscience can often be small, with data acquisition both costly and challenging, and their interpretation and downstream use in machine learning difficult due to data scarcity. Deep learning algorithms require large datasets to learn a robust relationship between the data a...

Full description

Saved in:
Bibliographic Details
Main Authors: Hisham I. Mamode, Gary J. Hampson, Cédric M. John
Format: Article
Language:English
Published: Elsevier 2025-02-01
Series:Applied Computing and Geosciences
Online Access:http://www.sciencedirect.com/science/article/pii/S2590197424000636
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850098651955724288
author Hisham I. Mamode
Gary J. Hampson
Cédric M. John
author_facet Hisham I. Mamode
Gary J. Hampson
Cédric M. John
author_sort Hisham I. Mamode
collection DOAJ
description Labelled datasets within geoscience can often be small, with data acquisition both costly and challenging, and their interpretation and downstream use in machine learning difficult due to data scarcity. Deep learning algorithms require large datasets to learn a robust relationship between the data and its label and avoid overfitting. To overcome the paucity of data, transfer learning has been employed in classification tasks. But an alternative exists: there often is a large corpus of unlabeled data which may enhance the learning process. To evaluate this potential for subsurface data, we compare a high-performance semi-supervised learning (SSL) algorithm (SimCLRv2) with supervised transfer learning on a Convolutional Neural Network (CNN) in geological image classification.We tested the two approaches on a classification task of sediment disturbance from cores of International Ocean Drilling Program (IODP) Expeditions 383 and 385. Our results show that semi-supervised transfer learning can be an effective strategy to adopt, with SimCLRv2 capable of producing representations comparable to those of supervised transfer learning. However attempts to enhance the performance of semi-supervised transfer learning with task-specific unlabeled images during self-supervision degraded representations. Significantly, we demonstrate that SimCLRv2 trained on a dataset of core disturbance images can out-perform supervised transfer learning of a CNN once a critical number of task-specific unlabeled images are available for self-supervision. The gain in performance compared to supervised transfer learning is 1% and 3% for binary and multi-class classification, respectively.Supervised transfer learning can be deployed with comparative ease, whereas the current SSL algorithms such as SimCLRv2 require more effort. We recommend that SSL be explored in cases when large amounts of unlabeled task-specific images exist and improvement of a few percent in metrics matter. When examining small, highly specialized datasets, without large amounts of unlabeled images, supervised transfer learning might be the best strategy to adopt. Overall, SSL is a promising approach and future work should explore this approach utilizing different dataset types, quantity, and quality.
format Article
id doaj-art-9435fa1ae2e748189fe9964c23babc29
institution DOAJ
issn 2590-1974
language English
publishDate 2025-02-01
publisher Elsevier
record_format Article
series Applied Computing and Geosciences
spelling doaj-art-9435fa1ae2e748189fe9964c23babc292025-08-20T02:40:40ZengElsevierApplied Computing and Geosciences2590-19742025-02-012510021610.1016/j.acags.2024.100216Do more with less: Exploring semi-supervised learning for geological image classificationHisham I. Mamode0Gary J. Hampson1Cédric M. John2Corresponding author.; Department of Earth Science and Engineering, Imperial College London, London, SW7 2AZ, UKDepartment of Earth Science and Engineering, Imperial College London, London, SW7 2AZ, UKDepartment of Earth Science and Engineering, Imperial College London, London, SW7 2AZ, UKLabelled datasets within geoscience can often be small, with data acquisition both costly and challenging, and their interpretation and downstream use in machine learning difficult due to data scarcity. Deep learning algorithms require large datasets to learn a robust relationship between the data and its label and avoid overfitting. To overcome the paucity of data, transfer learning has been employed in classification tasks. But an alternative exists: there often is a large corpus of unlabeled data which may enhance the learning process. To evaluate this potential for subsurface data, we compare a high-performance semi-supervised learning (SSL) algorithm (SimCLRv2) with supervised transfer learning on a Convolutional Neural Network (CNN) in geological image classification.We tested the two approaches on a classification task of sediment disturbance from cores of International Ocean Drilling Program (IODP) Expeditions 383 and 385. Our results show that semi-supervised transfer learning can be an effective strategy to adopt, with SimCLRv2 capable of producing representations comparable to those of supervised transfer learning. However attempts to enhance the performance of semi-supervised transfer learning with task-specific unlabeled images during self-supervision degraded representations. Significantly, we demonstrate that SimCLRv2 trained on a dataset of core disturbance images can out-perform supervised transfer learning of a CNN once a critical number of task-specific unlabeled images are available for self-supervision. The gain in performance compared to supervised transfer learning is 1% and 3% for binary and multi-class classification, respectively.Supervised transfer learning can be deployed with comparative ease, whereas the current SSL algorithms such as SimCLRv2 require more effort. We recommend that SSL be explored in cases when large amounts of unlabeled task-specific images exist and improvement of a few percent in metrics matter. When examining small, highly specialized datasets, without large amounts of unlabeled images, supervised transfer learning might be the best strategy to adopt. Overall, SSL is a promising approach and future work should explore this approach utilizing different dataset types, quantity, and quality.http://www.sciencedirect.com/science/article/pii/S2590197424000636
spellingShingle Hisham I. Mamode
Gary J. Hampson
Cédric M. John
Do more with less: Exploring semi-supervised learning for geological image classification
Applied Computing and Geosciences
title Do more with less: Exploring semi-supervised learning for geological image classification
title_full Do more with less: Exploring semi-supervised learning for geological image classification
title_fullStr Do more with less: Exploring semi-supervised learning for geological image classification
title_full_unstemmed Do more with less: Exploring semi-supervised learning for geological image classification
title_short Do more with less: Exploring semi-supervised learning for geological image classification
title_sort do more with less exploring semi supervised learning for geological image classification
url http://www.sciencedirect.com/science/article/pii/S2590197424000636
work_keys_str_mv AT hishamimamode domorewithlessexploringsemisupervisedlearningforgeologicalimageclassification
AT garyjhampson domorewithlessexploringsemisupervisedlearningforgeologicalimageclassification
AT cedricmjohn domorewithlessexploringsemisupervisedlearningforgeologicalimageclassification