Prediction of Total Organic Carbon Content in Shale Based on PCA-PSO-XGBoost

Total organic carbon (TOC) content is an important parameter for evaluating the abundance of organic matter in, and the hydrocarbon production capacity, of shale. Currently, no prediction method is applicable to all geological conditions, so exploring an efficient and accurate prediction method suit...

Full description

Saved in:
Bibliographic Details
Main Authors: Yingjie Meng, Chengwu Xu, Tingting Li, Tianyong Liu, Lu Tang, Jinyou Zhang
Format: Article
Language:English
Published: MDPI AG 2025-03-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/7/3447
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850184596613758976
author Yingjie Meng
Chengwu Xu
Tingting Li
Tianyong Liu
Lu Tang
Jinyou Zhang
author_facet Yingjie Meng
Chengwu Xu
Tingting Li
Tianyong Liu
Lu Tang
Jinyou Zhang
author_sort Yingjie Meng
collection DOAJ
description Total organic carbon (TOC) content is an important parameter for evaluating the abundance of organic matter in, and the hydrocarbon production capacity, of shale. Currently, no prediction method is applicable to all geological conditions, so exploring an efficient and accurate prediction method suitable for the study area is of great significance. In this study, for the shale of the Qingshankou Formation of the Gulong Sag in the Songliao Basin, TOC content prediction models using various machine learning algorithms are established and compared based on measured data, principal component analysis, and the particle swarm optimization algorithm. The results showed that GR, AC, DEN, CNL, LLS, and LLD are the most sensitive parameters using the Pearson correlation coefficient. The four principal components were also identified as input features through PCA processing. The XGBoost prediction model, established after selecting the parameters through PSO intelligence, had the highest accuracy with an <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msup><mrow><mi>R</mi></mrow><mrow><mn>2</mn></mrow></msup></mrow></semantics></math></inline-formula> and RMSE of 0.90 and 0.1545, respectively, which are superior to the values of the other models. This model is suitable for the prediction of TOC content and provides effective technical support for shale oil exploration and development in the study area.
format Article
id doaj-art-c4c7cc5e9e7d4e68b8ac81d52e35d5aa
institution OA Journals
issn 2076-3417
language English
publishDate 2025-03-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj-art-c4c7cc5e9e7d4e68b8ac81d52e35d5aa2025-08-20T02:17:00ZengMDPI AGApplied Sciences2076-34172025-03-01157344710.3390/app15073447Prediction of Total Organic Carbon Content in Shale Based on PCA-PSO-XGBoostYingjie Meng0Chengwu Xu1Tingting Li2Tianyong Liu3Lu Tang4Jinyou Zhang5State Key Laboratory of Continental Shale Oil, Daqing 163318, ChinaState Key Laboratory of Continental Shale Oil, Daqing 163318, ChinaSchool of Earth Sciences, Northeast Petroleum University, Daqing 163318, ChinaState Key Laboratory of Continental Shale Oil, Daqing 163318, ChinaSchool of Earth Sciences, Northeast Petroleum University, Daqing 163318, ChinaState Key Laboratory of Continental Shale Oil, Daqing 163318, ChinaTotal organic carbon (TOC) content is an important parameter for evaluating the abundance of organic matter in, and the hydrocarbon production capacity, of shale. Currently, no prediction method is applicable to all geological conditions, so exploring an efficient and accurate prediction method suitable for the study area is of great significance. In this study, for the shale of the Qingshankou Formation of the Gulong Sag in the Songliao Basin, TOC content prediction models using various machine learning algorithms are established and compared based on measured data, principal component analysis, and the particle swarm optimization algorithm. The results showed that GR, AC, DEN, CNL, LLS, and LLD are the most sensitive parameters using the Pearson correlation coefficient. The four principal components were also identified as input features through PCA processing. The XGBoost prediction model, established after selecting the parameters through PSO intelligence, had the highest accuracy with an <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msup><mrow><mi>R</mi></mrow><mrow><mn>2</mn></mrow></msup></mrow></semantics></math></inline-formula> and RMSE of 0.90 and 0.1545, respectively, which are superior to the values of the other models. This model is suitable for the prediction of TOC content and provides effective technical support for shale oil exploration and development in the study area.https://www.mdpi.com/2076-3417/15/7/3447shaletotal organic carbonprincipal component analysisparticle swarm optimizationmachine learning
spellingShingle Yingjie Meng
Chengwu Xu
Tingting Li
Tianyong Liu
Lu Tang
Jinyou Zhang
Prediction of Total Organic Carbon Content in Shale Based on PCA-PSO-XGBoost
Applied Sciences
shale
total organic carbon
principal component analysis
particle swarm optimization
machine learning
title Prediction of Total Organic Carbon Content in Shale Based on PCA-PSO-XGBoost
title_full Prediction of Total Organic Carbon Content in Shale Based on PCA-PSO-XGBoost
title_fullStr Prediction of Total Organic Carbon Content in Shale Based on PCA-PSO-XGBoost
title_full_unstemmed Prediction of Total Organic Carbon Content in Shale Based on PCA-PSO-XGBoost
title_short Prediction of Total Organic Carbon Content in Shale Based on PCA-PSO-XGBoost
title_sort prediction of total organic carbon content in shale based on pca pso xgboost
topic shale
total organic carbon
principal component analysis
particle swarm optimization
machine learning
url https://www.mdpi.com/2076-3417/15/7/3447
work_keys_str_mv AT yingjiemeng predictionoftotalorganiccarboncontentinshalebasedonpcapsoxgboost
AT chengwuxu predictionoftotalorganiccarboncontentinshalebasedonpcapsoxgboost
AT tingtingli predictionoftotalorganiccarboncontentinshalebasedonpcapsoxgboost
AT tianyongliu predictionoftotalorganiccarboncontentinshalebasedonpcapsoxgboost
AT lutang predictionoftotalorganiccarboncontentinshalebasedonpcapsoxgboost
AT jinyouzhang predictionoftotalorganiccarboncontentinshalebasedonpcapsoxgboost