Spatio-temporal distribution of global stromatolites through geological time identified by a large language model approach

IntroductionA substantial amount of data embedded within diverse literature makes it time-consuming to manually extract and compile extensive datasets. The use of large language models has become essential for the efficient extraction and analysis of big data. This study utilizes ChatGPT-4 to recons...

Full description

Saved in:
Bibliographic Details
Main Authors: Hao Li, Min Zhang
Format: Article
Language:English
Published: Frontiers Media S.A. 2025-03-01
Series:Frontiers in Earth Science
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/feart.2025.1563011/full
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:IntroductionA substantial amount of data embedded within diverse literature makes it time-consuming to manually extract and compile extensive datasets. The use of large language models has become essential for the efficient extraction and analysis of big data. This study utilizes ChatGPT-4 to reconstruct a global database of stromatolites, spanning from the Precambrian to the present, to enhance our understanding of their spatial and temporal dynamics throughout geological time.MethodsThe data extraction process involved several steps: First, PDF documents containing stromatolite literature were gathered and converted into text format. Second, ChatGPT-4 was employed to extract data on stromatolite occurrences, including locations, ages, strata, and facies types from each sentence in the documents. Third, duplicates were removed, and the data were organized into three categories: 3,248 unique location-age pairs, 2,723 strata-age pairs, and 1,723 strata-age-facies type combinations. Additionally, 2,565 paleogeographical locations of stromatolite-bearing rocks were reconstructed using modern latitude and longitude coordinates and corresponding Phanerozoic ages.ResultsThe newly obtained dataset reveals that stromatolite occurrences peaked during the Proterozoic, declined during the Early Phanerozoic, and exhibited fluctuations throughout the Phanerozoic. Seven global stromatolite hotspots were identified: the United States, Australia, India, Canada, China, England, and Russia. From the Cambrian to the Jurassic, stromatolites were predominantly distributed in low and middle latitudes, shifting to higher latitudes from the Cretaceous to the Quaternary. The proportion of inland aquatic stromatolites relative to marine stromatolites varied, ranging from 10% to 30% from the Mesoarchean to the Middle Mesoproterozoic, decreasing to less than 10% from the Late Mesoproterozoic to the Early Paleozoic, increasing to 10%–30% from the Devonian to the Jurassic, and remaining high (39%–53%) from the Cretaceous to the Quaternary.DiscussionThe findings highlight the temporal and spatial variability of stromatolite occurrences, shedding light on the evolution of these microbial structures over geological time. The distribution patterns suggest significant shifts in environmental conditions and provide valuable insights into paleogeographical and ecological dynamics. The use of ChatGPT-4 to extract and organize data from a large body of literature demonstrates the potential of large language models for advancing research in paleobiology and geology.
ISSN:2296-6463