A Multimodal Large Language Model Framework for Intelligent Perception and Decision-Making in Smart Manufacturing
In modern manufacturing, making accurate and timely decisions requires the ability to effectively handle multiple types of data. This paper presents a multimodal system designed specifically for smart manufacturing applications. The system combines various data sources including images, sensor data,...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-05-01
|
| Series: | Sensors |
| Subjects: | |
| Online Access: | https://www.mdpi.com/1424-8220/25/10/3072 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850126919640547328 |
|---|---|
| author | Tianyu Wang Bowen Zhang Daqi Jiang Dong Li |
| author_facet | Tianyu Wang Bowen Zhang Daqi Jiang Dong Li |
| author_sort | Tianyu Wang |
| collection | DOAJ |
| description | In modern manufacturing, making accurate and timely decisions requires the ability to effectively handle multiple types of data. This paper presents a multimodal system designed specifically for smart manufacturing applications. The system combines various data sources including images, sensor data, and production records, using advanced multimodal large language models. This approach addresses common limitations of traditional single-modal methods, such as isolated data analysis and poor integration between different data types. Key contributions include a unified method for representing different data types, dynamic semantic tokenization for better data processing, strong alignment strategies across modalities, and a practical two-stage training method involving initial large-scale pretraining and later fine-tuning for specific tasks. Additionally, a novel Transformer-based model is introduced for generating both images and text, significantly improving real-time decision-making capabilities. Experiments on relevant industrial datasets show that this method consistently performs better than current state-of-the-art approaches in tasks like image–text retrieval and visual question answering. The results demonstrate the effectiveness and versatility of the proposed methods, offering important insights and practical solutions to enhance intelligent manufacturing, predictive maintenance, and anomaly detection, thus supporting the development of more efficient and reliable industrial systems. |
| format | Article |
| id | doaj-art-8d3a558968a34693a5efdbefde8e143c |
| institution | OA Journals |
| issn | 1424-8220 |
| language | English |
| publishDate | 2025-05-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Sensors |
| spelling | doaj-art-8d3a558968a34693a5efdbefde8e143c2025-08-20T02:33:48ZengMDPI AGSensors1424-82202025-05-012510307210.3390/s25103072A Multimodal Large Language Model Framework for Intelligent Perception and Decision-Making in Smart ManufacturingTianyu Wang0Bowen Zhang1Daqi Jiang2Dong Li3State Key Laboratory of Robotics, Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang 110016, ChinaShenyang Institute of Automation, Chinese Academy of Sciences, Shenyang 110016, ChinaNational Frontiers Science Center for Industrial Intelligence and Systems Optimization, Northeastern University, Shenyang 110819, ChinaState Key Laboratory of Robotics, Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang 110016, ChinaIn modern manufacturing, making accurate and timely decisions requires the ability to effectively handle multiple types of data. This paper presents a multimodal system designed specifically for smart manufacturing applications. The system combines various data sources including images, sensor data, and production records, using advanced multimodal large language models. This approach addresses common limitations of traditional single-modal methods, such as isolated data analysis and poor integration between different data types. Key contributions include a unified method for representing different data types, dynamic semantic tokenization for better data processing, strong alignment strategies across modalities, and a practical two-stage training method involving initial large-scale pretraining and later fine-tuning for specific tasks. Additionally, a novel Transformer-based model is introduced for generating both images and text, significantly improving real-time decision-making capabilities. Experiments on relevant industrial datasets show that this method consistently performs better than current state-of-the-art approaches in tasks like image–text retrieval and visual question answering. The results demonstrate the effectiveness and versatility of the proposed methods, offering important insights and practical solutions to enhance intelligent manufacturing, predictive maintenance, and anomaly detection, thus supporting the development of more efficient and reliable industrial systems.https://www.mdpi.com/1424-8220/25/10/3072multimodal large language modelsmart manufacturingsemantic tokenizationTransformer modeldecision-making |
| spellingShingle | Tianyu Wang Bowen Zhang Daqi Jiang Dong Li A Multimodal Large Language Model Framework for Intelligent Perception and Decision-Making in Smart Manufacturing Sensors multimodal large language model smart manufacturing semantic tokenization Transformer model decision-making |
| title | A Multimodal Large Language Model Framework for Intelligent Perception and Decision-Making in Smart Manufacturing |
| title_full | A Multimodal Large Language Model Framework for Intelligent Perception and Decision-Making in Smart Manufacturing |
| title_fullStr | A Multimodal Large Language Model Framework for Intelligent Perception and Decision-Making in Smart Manufacturing |
| title_full_unstemmed | A Multimodal Large Language Model Framework for Intelligent Perception and Decision-Making in Smart Manufacturing |
| title_short | A Multimodal Large Language Model Framework for Intelligent Perception and Decision-Making in Smart Manufacturing |
| title_sort | multimodal large language model framework for intelligent perception and decision making in smart manufacturing |
| topic | multimodal large language model smart manufacturing semantic tokenization Transformer model decision-making |
| url | https://www.mdpi.com/1424-8220/25/10/3072 |
| work_keys_str_mv | AT tianyuwang amultimodallargelanguagemodelframeworkforintelligentperceptionanddecisionmakinginsmartmanufacturing AT bowenzhang amultimodallargelanguagemodelframeworkforintelligentperceptionanddecisionmakinginsmartmanufacturing AT daqijiang amultimodallargelanguagemodelframeworkforintelligentperceptionanddecisionmakinginsmartmanufacturing AT dongli amultimodallargelanguagemodelframeworkforintelligentperceptionanddecisionmakinginsmartmanufacturing AT tianyuwang multimodallargelanguagemodelframeworkforintelligentperceptionanddecisionmakinginsmartmanufacturing AT bowenzhang multimodallargelanguagemodelframeworkforintelligentperceptionanddecisionmakinginsmartmanufacturing AT daqijiang multimodallargelanguagemodelframeworkforintelligentperceptionanddecisionmakinginsmartmanufacturing AT dongli multimodallargelanguagemodelframeworkforintelligentperceptionanddecisionmakinginsmartmanufacturing |