Developing a large language model for oil- and gas-related rock mechanics: Progress and challenges

In recent years, large language models (LLMs) have demonstrated immense potential in practical applications to enhance work efficiency and decision-making capabilities. However, specialized LLMs in the oil and gas engineering area are rarely developed. To aid in exploring and developing deep and ult...

Full description

Saved in:
Bibliographic Details
Main Authors: Botao Lin, Yan Jin, Qianwen Cao, Han Meng, Huiwen Pang, Shiming Wei
Format: Article
Language:English
Published: KeAi Communications Co., Ltd. 2025-04-01
Series:Natural Gas Industry B
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S235285402500021X
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850173611466293248
author Botao Lin
Yan Jin
Qianwen Cao
Han Meng
Huiwen Pang
Shiming Wei
author_facet Botao Lin
Yan Jin
Qianwen Cao
Han Meng
Huiwen Pang
Shiming Wei
author_sort Botao Lin
collection DOAJ
description In recent years, large language models (LLMs) have demonstrated immense potential in practical applications to enhance work efficiency and decision-making capabilities. However, specialized LLMs in the oil and gas engineering area are rarely developed. To aid in exploring and developing deep and ultra-deep unconventional reservoirs, there is a call for a personalized LLM on oil- and gas-related rock mechanics, which may handle complex professional data and make intelligent predictions and decisions. To that end, herein, we overview general and industry-specific LLMs. Then, a systematic workflow is proposed for building this domain-specific LLM for oil and gas engineering, including data collection and processing, model construction and training, model validation, and implementation in the specific domain. Moreover, three application scenarios are investigated: knowledge extraction from textural resources, field operation with multidisciplinary integration, and intelligent decision assistance. Finally, several challenges in developing this domain-specific LLM are highlighted. Our key findings are that geological surveys, laboratory experiments, field tests, and numerical simulations form the four original sources of rock mechanics data. Those data must flow through collection, storage, processing, and governance before being fed into LLM training. This domain-specific LLM can be trained by fine-tuning a general open-source LLM with professional data and constraints such as rock mechanics datasets and principles. The LLM can then follow the commonly used training and validation processes before being implemented in the oil and gas field. However, there are three primary challenges in building this domain-specific LLM: data standardization, data security and access, and striking a compromise between physics and data when building the model structure. Some of these challenges are administrative rather than technical, and overcoming those requires close collaboration between the different interested parties and various professional practitioners.
format Article
id doaj-art-cf3b2b309b3c48eead9f5deccb719695
institution OA Journals
issn 2352-8540
language English
publishDate 2025-04-01
publisher KeAi Communications Co., Ltd.
record_format Article
series Natural Gas Industry B
spelling doaj-art-cf3b2b309b3c48eead9f5deccb7196952025-08-20T02:19:48ZengKeAi Communications Co., Ltd.Natural Gas Industry B2352-85402025-04-0112211012210.1016/j.ngib.2025.03.007Developing a large language model for oil- and gas-related rock mechanics: Progress and challengesBotao Lin0Yan Jin1Qianwen Cao2Han Meng3Huiwen Pang4Shiming Wei5College of Artificial Intelligence, China University of Petroleum, Beijing 102249, ChinaState Key Laboratory of Petroleum Resources and Engineering, China University of Petroleum, Beijing 102249, China; Corresponding author.College of Artificial Intelligence, China University of Petroleum, Beijing 102249, ChinaCollege of Artificial Intelligence, China University of Petroleum, Beijing 102249, ChinaCollege of Science, China University of Petroleum, Beijing 102249, ChinaCollege of Science, China University of Petroleum, Beijing 102249, ChinaIn recent years, large language models (LLMs) have demonstrated immense potential in practical applications to enhance work efficiency and decision-making capabilities. However, specialized LLMs in the oil and gas engineering area are rarely developed. To aid in exploring and developing deep and ultra-deep unconventional reservoirs, there is a call for a personalized LLM on oil- and gas-related rock mechanics, which may handle complex professional data and make intelligent predictions and decisions. To that end, herein, we overview general and industry-specific LLMs. Then, a systematic workflow is proposed for building this domain-specific LLM for oil and gas engineering, including data collection and processing, model construction and training, model validation, and implementation in the specific domain. Moreover, three application scenarios are investigated: knowledge extraction from textural resources, field operation with multidisciplinary integration, and intelligent decision assistance. Finally, several challenges in developing this domain-specific LLM are highlighted. Our key findings are that geological surveys, laboratory experiments, field tests, and numerical simulations form the four original sources of rock mechanics data. Those data must flow through collection, storage, processing, and governance before being fed into LLM training. This domain-specific LLM can be trained by fine-tuning a general open-source LLM with professional data and constraints such as rock mechanics datasets and principles. The LLM can then follow the commonly used training and validation processes before being implemented in the oil and gas field. However, there are three primary challenges in building this domain-specific LLM: data standardization, data security and access, and striking a compromise between physics and data when building the model structure. Some of these challenges are administrative rather than technical, and overcoming those requires close collaboration between the different interested parties and various professional practitioners.http://www.sciencedirect.com/science/article/pii/S235285402500021XLarge language modelOil and gasRock mechanicsData processingArtificial intelligence
spellingShingle Botao Lin
Yan Jin
Qianwen Cao
Han Meng
Huiwen Pang
Shiming Wei
Developing a large language model for oil- and gas-related rock mechanics: Progress and challenges
Natural Gas Industry B
Large language model
Oil and gas
Rock mechanics
Data processing
Artificial intelligence
title Developing a large language model for oil- and gas-related rock mechanics: Progress and challenges
title_full Developing a large language model for oil- and gas-related rock mechanics: Progress and challenges
title_fullStr Developing a large language model for oil- and gas-related rock mechanics: Progress and challenges
title_full_unstemmed Developing a large language model for oil- and gas-related rock mechanics: Progress and challenges
title_short Developing a large language model for oil- and gas-related rock mechanics: Progress and challenges
title_sort developing a large language model for oil and gas related rock mechanics progress and challenges
topic Large language model
Oil and gas
Rock mechanics
Data processing
Artificial intelligence
url http://www.sciencedirect.com/science/article/pii/S235285402500021X
work_keys_str_mv AT botaolin developingalargelanguagemodelforoilandgasrelatedrockmechanicsprogressandchallenges
AT yanjin developingalargelanguagemodelforoilandgasrelatedrockmechanicsprogressandchallenges
AT qianwencao developingalargelanguagemodelforoilandgasrelatedrockmechanicsprogressandchallenges
AT hanmeng developingalargelanguagemodelforoilandgasrelatedrockmechanicsprogressandchallenges
AT huiwenpang developingalargelanguagemodelforoilandgasrelatedrockmechanicsprogressandchallenges
AT shimingwei developingalargelanguagemodelforoilandgasrelatedrockmechanicsprogressandchallenges