Drilling Condition Identification Method for Imbalanced Datasets

To address the challenges posed by class imbalance and temporal dependency in drilling condition data and enhance the accuracy of condition identification, this study proposes an integrated method combining feature engineering, data resampling, and deep learning model optimization. Firstly, a featur...

Full description

Saved in:
Bibliographic Details
Main Authors: Yibing Yu, Huilin Yang, Fengjia Peng, Xi Wang
Format: Article
Language:English
Published: MDPI AG 2025-03-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/6/3362
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:To address the challenges posed by class imbalance and temporal dependency in drilling condition data and enhance the accuracy of condition identification, this study proposes an integrated method combining feature engineering, data resampling, and deep learning model optimization. Firstly, a feature selection strategy based on weighted symmetrical uncertainty is employed, assigning higher weights to critical features that distinguish minority classes, thereby enhancing class contrast and improving the classification capability of the model. Secondly, a sliding-window-based Synthetic Minority Oversampling Technique (SMOTE) algorithm is developed, which generates new minority-class samples while preserving temporal dependencies, achieving balanced data distribution among classes. Finally, a coupled model integrating bidirectional long short-term memory (BiLSTM) networks and gated recurrent units (GRUs) is constructed. The BiLSTM component captures global contextual information, while the GRU efficiently learns features from complex sequential data. The proposed approach was validated using logging data from 14 wells and compared against existing models, including RNN, CNN, FCN, and LSTM. The experimental results demonstrated that the proposed method achieved classification <i>F1</i> score improvements of 8.95%, 9.58%, 10.25%, and 8.59%, respectively, over these traditional models. Additionally, classification loss values were reduced by 0.32, 0.3315, 0.2893, and 0.2246, respectively. These findings underscore the significant improvements in both accuracy and balance achieved by the proposed method for drilling condition identification. The results indicate that the proposed approach effectively addresses class imbalance and temporal dependency issues in drilling condition data, substantially enhancing classification performance for complex sequential data. This work provides a practical and efficient solution for drilling condition recognition.
ISSN:2076-3417