Deep Learning and Methods Based on Large Language Models Applied to Stellar Light Curve Classification
Light curves serve as a valuable source of information on stellar formation and evolution. With the rapid advancement of machine learning techniques, they can be effectively processed to extract astronomical patterns and information. In this study, we present a comprehensive evaluation of models bas...
Saved in:
| Main Authors: | , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
American Association for the Advancement of Science (AAAS)
2025-01-01
|
| Series: | Intelligent Computing |
| Online Access: | https://spj.science.org/doi/10.34133/icomputing.0110 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850141303110631424 |
|---|---|
| author | Yu-Yang Li Yu Bai Cunshi Wang Mengwei Qu Ziteng Lu Roberto Soria Jifeng Liu |
| author_facet | Yu-Yang Li Yu Bai Cunshi Wang Mengwei Qu Ziteng Lu Roberto Soria Jifeng Liu |
| author_sort | Yu-Yang Li |
| collection | DOAJ |
| description | Light curves serve as a valuable source of information on stellar formation and evolution. With the rapid advancement of machine learning techniques, they can be effectively processed to extract astronomical patterns and information. In this study, we present a comprehensive evaluation of models based on deep learning and large language models (LLMs) for the automatic classification of variable star light curves, using large datasets from the Kepler and K2 missions. Special emphasis is placed on Cepheids, RR Lyrae, and eclipsing binaries, examining the influence of observational cadence and phase distribution on classification precision. Employing automated deep learning optimization, we achieve striking performance using 2 architectures: one that combines one-dimensional convolution (Conv1D) with bidirectional long short-term memory (BiLSTM) and another called the Swin Transformer. These achieved accuracies of 94% and 99%, respectively, with the latter demonstrating a notable 83% accuracy in discerning the elusive type II Cepheids that comprise merely 0.02% of the total dataset. We unveil StarWhisper LightCurve (LC), a series of 3 LLM models based on an LLM, a multimodal large language model (MLLM), and a large audio language model (LALM). Each model is fine-tuned with strategic prompt engineering and customized training methods to explore the emergent abilities of these models for astronomical data. Remarkably, StarWhisper LC series models exhibit high accuracies of around 90%, considerably reducing the need for explicit feature engineering, thereby paving the way for streamlined parallel data processing and the progression of multifaceted multimodal models in astronomical applications. The study furnishes 2 detailed catalogs illustrating the impacts of phase and sampling intervals on deep learning classification accuracy, showing that a substantial decrease of up to 14% in observation duration and 21% in sampling points can be realized without compromising accuracy by more than 10%. |
| format | Article |
| id | doaj-art-a656fc6b1b9e4c099556890ecbb9c80d |
| institution | OA Journals |
| issn | 2771-5892 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | American Association for the Advancement of Science (AAAS) |
| record_format | Article |
| series | Intelligent Computing |
| spelling | doaj-art-a656fc6b1b9e4c099556890ecbb9c80d2025-08-20T02:29:29ZengAmerican Association for the Advancement of Science (AAAS)Intelligent Computing2771-58922025-01-01410.34133/icomputing.0110Deep Learning and Methods Based on Large Language Models Applied to Stellar Light Curve ClassificationYu-Yang Li0Yu Bai1Cunshi Wang2Mengwei Qu3Ziteng Lu4Roberto Soria5Jifeng Liu6Key Laboratory of Optical Astronomy, National Astronomical Observatories, Chinese Academy of Sciences, Beijing 100101, China.Key Laboratory of Optical Astronomy, National Astronomical Observatories, Chinese Academy of Sciences, Beijing 100101, China.Key Laboratory of Optical Astronomy, National Astronomical Observatories, Chinese Academy of Sciences, Beijing 100101, China.State Key Laboratory of Isotope Geochemistry, Guangzhou Institute of Geochemistry, Chinese Academy of Sciences, Guangzhou 510640, China.School of Foreign Studies, Tongling University, Tongling, Anhui 244061, China.College of Astronomy and Space Sciences, University of Chinese Academy of Sciences, Beijing 100049, China.Key Laboratory of Optical Astronomy, National Astronomical Observatories, Chinese Academy of Sciences, Beijing 100101, China.Light curves serve as a valuable source of information on stellar formation and evolution. With the rapid advancement of machine learning techniques, they can be effectively processed to extract astronomical patterns and information. In this study, we present a comprehensive evaluation of models based on deep learning and large language models (LLMs) for the automatic classification of variable star light curves, using large datasets from the Kepler and K2 missions. Special emphasis is placed on Cepheids, RR Lyrae, and eclipsing binaries, examining the influence of observational cadence and phase distribution on classification precision. Employing automated deep learning optimization, we achieve striking performance using 2 architectures: one that combines one-dimensional convolution (Conv1D) with bidirectional long short-term memory (BiLSTM) and another called the Swin Transformer. These achieved accuracies of 94% and 99%, respectively, with the latter demonstrating a notable 83% accuracy in discerning the elusive type II Cepheids that comprise merely 0.02% of the total dataset. We unveil StarWhisper LightCurve (LC), a series of 3 LLM models based on an LLM, a multimodal large language model (MLLM), and a large audio language model (LALM). Each model is fine-tuned with strategic prompt engineering and customized training methods to explore the emergent abilities of these models for astronomical data. Remarkably, StarWhisper LC series models exhibit high accuracies of around 90%, considerably reducing the need for explicit feature engineering, thereby paving the way for streamlined parallel data processing and the progression of multifaceted multimodal models in astronomical applications. The study furnishes 2 detailed catalogs illustrating the impacts of phase and sampling intervals on deep learning classification accuracy, showing that a substantial decrease of up to 14% in observation duration and 21% in sampling points can be realized without compromising accuracy by more than 10%.https://spj.science.org/doi/10.34133/icomputing.0110 |
| spellingShingle | Yu-Yang Li Yu Bai Cunshi Wang Mengwei Qu Ziteng Lu Roberto Soria Jifeng Liu Deep Learning and Methods Based on Large Language Models Applied to Stellar Light Curve Classification Intelligent Computing |
| title | Deep Learning and Methods Based on Large Language Models Applied to Stellar Light Curve Classification |
| title_full | Deep Learning and Methods Based on Large Language Models Applied to Stellar Light Curve Classification |
| title_fullStr | Deep Learning and Methods Based on Large Language Models Applied to Stellar Light Curve Classification |
| title_full_unstemmed | Deep Learning and Methods Based on Large Language Models Applied to Stellar Light Curve Classification |
| title_short | Deep Learning and Methods Based on Large Language Models Applied to Stellar Light Curve Classification |
| title_sort | deep learning and methods based on large language models applied to stellar light curve classification |
| url | https://spj.science.org/doi/10.34133/icomputing.0110 |
| work_keys_str_mv | AT yuyangli deeplearningandmethodsbasedonlargelanguagemodelsappliedtostellarlightcurveclassification AT yubai deeplearningandmethodsbasedonlargelanguagemodelsappliedtostellarlightcurveclassification AT cunshiwang deeplearningandmethodsbasedonlargelanguagemodelsappliedtostellarlightcurveclassification AT mengweiqu deeplearningandmethodsbasedonlargelanguagemodelsappliedtostellarlightcurveclassification AT zitenglu deeplearningandmethodsbasedonlargelanguagemodelsappliedtostellarlightcurveclassification AT robertosoria deeplearningandmethodsbasedonlargelanguagemodelsappliedtostellarlightcurveclassification AT jifengliu deeplearningandmethodsbasedonlargelanguagemodelsappliedtostellarlightcurveclassification |