Deep Learning and Methods Based on Large Language Models Applied to Stellar Light Curve Classification

Light curves serve as a valuable source of information on stellar formation and evolution. With the rapid advancement of machine learning techniques, they can be effectively processed to extract astronomical patterns and information. In this study, we present a comprehensive evaluation of models bas...

Full description

Saved in:
Bibliographic Details
Main Authors: Yu-Yang Li, Yu Bai, Cunshi Wang, Mengwei Qu, Ziteng Lu, Roberto Soria, Jifeng Liu
Format: Article
Language:English
Published: American Association for the Advancement of Science (AAAS) 2025-01-01
Series:Intelligent Computing
Online Access:https://spj.science.org/doi/10.34133/icomputing.0110
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850141303110631424
author Yu-Yang Li
Yu Bai
Cunshi Wang
Mengwei Qu
Ziteng Lu
Roberto Soria
Jifeng Liu
author_facet Yu-Yang Li
Yu Bai
Cunshi Wang
Mengwei Qu
Ziteng Lu
Roberto Soria
Jifeng Liu
author_sort Yu-Yang Li
collection DOAJ
description Light curves serve as a valuable source of information on stellar formation and evolution. With the rapid advancement of machine learning techniques, they can be effectively processed to extract astronomical patterns and information. In this study, we present a comprehensive evaluation of models based on deep learning and large language models (LLMs) for the automatic classification of variable star light curves, using large datasets from the Kepler and K2 missions. Special emphasis is placed on Cepheids, RR Lyrae, and eclipsing binaries, examining the influence of observational cadence and phase distribution on classification precision. Employing automated deep learning optimization, we achieve striking performance using 2 architectures: one that combines one-dimensional convolution (Conv1D) with bidirectional long short-term memory (BiLSTM) and another called the Swin Transformer. These achieved accuracies of 94% and 99%, respectively, with the latter demonstrating a notable 83% accuracy in discerning the elusive type II Cepheids that comprise merely 0.02% of the total dataset. We unveil StarWhisper LightCurve (LC), a series of 3 LLM models based on an LLM, a multimodal large language model (MLLM), and a large audio language model (LALM). Each model is fine-tuned with strategic prompt engineering and customized training methods to explore the emergent abilities of these models for astronomical data. Remarkably, StarWhisper LC series models exhibit high accuracies of around 90%, considerably reducing the need for explicit feature engineering, thereby paving the way for streamlined parallel data processing and the progression of multifaceted multimodal models in astronomical applications. The study furnishes 2 detailed catalogs illustrating the impacts of phase and sampling intervals on deep learning classification accuracy, showing that a substantial decrease of up to 14% in observation duration and 21% in sampling points can be realized without compromising accuracy by more than 10%.
format Article
id doaj-art-a656fc6b1b9e4c099556890ecbb9c80d
institution OA Journals
issn 2771-5892
language English
publishDate 2025-01-01
publisher American Association for the Advancement of Science (AAAS)
record_format Article
series Intelligent Computing
spelling doaj-art-a656fc6b1b9e4c099556890ecbb9c80d2025-08-20T02:29:29ZengAmerican Association for the Advancement of Science (AAAS)Intelligent Computing2771-58922025-01-01410.34133/icomputing.0110Deep Learning and Methods Based on Large Language Models Applied to Stellar Light Curve ClassificationYu-Yang Li0Yu Bai1Cunshi Wang2Mengwei Qu3Ziteng Lu4Roberto Soria5Jifeng Liu6Key Laboratory of Optical Astronomy, National Astronomical Observatories, Chinese Academy of Sciences, Beijing 100101, China.Key Laboratory of Optical Astronomy, National Astronomical Observatories, Chinese Academy of Sciences, Beijing 100101, China.Key Laboratory of Optical Astronomy, National Astronomical Observatories, Chinese Academy of Sciences, Beijing 100101, China.State Key Laboratory of Isotope Geochemistry, Guangzhou Institute of Geochemistry, Chinese Academy of Sciences, Guangzhou 510640, China.School of Foreign Studies, Tongling University, Tongling, Anhui 244061, China.College of Astronomy and Space Sciences, University of Chinese Academy of Sciences, Beijing 100049, China.Key Laboratory of Optical Astronomy, National Astronomical Observatories, Chinese Academy of Sciences, Beijing 100101, China.Light curves serve as a valuable source of information on stellar formation and evolution. With the rapid advancement of machine learning techniques, they can be effectively processed to extract astronomical patterns and information. In this study, we present a comprehensive evaluation of models based on deep learning and large language models (LLMs) for the automatic classification of variable star light curves, using large datasets from the Kepler and K2 missions. Special emphasis is placed on Cepheids, RR Lyrae, and eclipsing binaries, examining the influence of observational cadence and phase distribution on classification precision. Employing automated deep learning optimization, we achieve striking performance using 2 architectures: one that combines one-dimensional convolution (Conv1D) with bidirectional long short-term memory (BiLSTM) and another called the Swin Transformer. These achieved accuracies of 94% and 99%, respectively, with the latter demonstrating a notable 83% accuracy in discerning the elusive type II Cepheids that comprise merely 0.02% of the total dataset. We unveil StarWhisper LightCurve (LC), a series of 3 LLM models based on an LLM, a multimodal large language model (MLLM), and a large audio language model (LALM). Each model is fine-tuned with strategic prompt engineering and customized training methods to explore the emergent abilities of these models for astronomical data. Remarkably, StarWhisper LC series models exhibit high accuracies of around 90%, considerably reducing the need for explicit feature engineering, thereby paving the way for streamlined parallel data processing and the progression of multifaceted multimodal models in astronomical applications. The study furnishes 2 detailed catalogs illustrating the impacts of phase and sampling intervals on deep learning classification accuracy, showing that a substantial decrease of up to 14% in observation duration and 21% in sampling points can be realized without compromising accuracy by more than 10%.https://spj.science.org/doi/10.34133/icomputing.0110
spellingShingle Yu-Yang Li
Yu Bai
Cunshi Wang
Mengwei Qu
Ziteng Lu
Roberto Soria
Jifeng Liu
Deep Learning and Methods Based on Large Language Models Applied to Stellar Light Curve Classification
Intelligent Computing
title Deep Learning and Methods Based on Large Language Models Applied to Stellar Light Curve Classification
title_full Deep Learning and Methods Based on Large Language Models Applied to Stellar Light Curve Classification
title_fullStr Deep Learning and Methods Based on Large Language Models Applied to Stellar Light Curve Classification
title_full_unstemmed Deep Learning and Methods Based on Large Language Models Applied to Stellar Light Curve Classification
title_short Deep Learning and Methods Based on Large Language Models Applied to Stellar Light Curve Classification
title_sort deep learning and methods based on large language models applied to stellar light curve classification
url https://spj.science.org/doi/10.34133/icomputing.0110
work_keys_str_mv AT yuyangli deeplearningandmethodsbasedonlargelanguagemodelsappliedtostellarlightcurveclassification
AT yubai deeplearningandmethodsbasedonlargelanguagemodelsappliedtostellarlightcurveclassification
AT cunshiwang deeplearningandmethodsbasedonlargelanguagemodelsappliedtostellarlightcurveclassification
AT mengweiqu deeplearningandmethodsbasedonlargelanguagemodelsappliedtostellarlightcurveclassification
AT zitenglu deeplearningandmethodsbasedonlargelanguagemodelsappliedtostellarlightcurveclassification
AT robertosoria deeplearningandmethodsbasedonlargelanguagemodelsappliedtostellarlightcurveclassification
AT jifengliu deeplearningandmethodsbasedonlargelanguagemodelsappliedtostellarlightcurveclassification