Instruction multi-constraint molecular generation using a teacher-student large language model
Abstract Background While various models and computational tools have been proposed for structure and property analysis of molecules, generating molecules that conform to all desired structures and properties remains a challenge. Results We introduce a multi-constraint molecular generation large lan...
Saved in:
| Main Authors: | , , , , , , , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
BMC
2025-04-01
|
| Series: | BMC Biology |
| Subjects: | |
| Online Access: | https://doi.org/10.1186/s12915-025-02200-3 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850139054820032512 |
|---|---|
| author | Peng Zhou Jianmin Wang Chunyan Li Zixu Wang Yiping Liu Siqi Sun Jianxin Lin Leyi Wei Xibao Cai Houtim Lai Wei Liu Longyue Wang Yuansheng Liu Xiangxiang Zeng |
| author_facet | Peng Zhou Jianmin Wang Chunyan Li Zixu Wang Yiping Liu Siqi Sun Jianxin Lin Leyi Wei Xibao Cai Houtim Lai Wei Liu Longyue Wang Yuansheng Liu Xiangxiang Zeng |
| author_sort | Peng Zhou |
| collection | DOAJ |
| description | Abstract Background While various models and computational tools have been proposed for structure and property analysis of molecules, generating molecules that conform to all desired structures and properties remains a challenge. Results We introduce a multi-constraint molecular generation large language model, TSMMG, which, akin to a student, incorporates knowledge from various small models and tools, namely, the “teachers.” To train TSMMG, we construct a large set of text-molecule pairs by extracting molecular knowledge from these “teachers,” enabling it to generate novel molecules that conform to the descriptions through various text prompts. We experimentally show that TSMMG remarkably performs in generating molecules that meet complex property requirements described in natural language across two-, three-, and four-constraint tasks, with an average molecular validity of over 99% and success ratio of 82.58%, 68.03%, and 67.48%, respectively. The model also exhibits adaptability through zero-shot testing, creating molecules that satisfy combinations of properties that have not been encountered. It can comprehend text inputs with various language styles, extending beyond the confines of outlined prompts. Conclusions TSMMG presents an effective model for multi-constraint molecular generation using natural language. This framework is not only applicable to drug discovery but also serves as a reference for other related fields. |
| format | Article |
| id | doaj-art-59baed04f1c94afc977fdcb61edddcf2 |
| institution | OA Journals |
| issn | 1741-7007 |
| language | English |
| publishDate | 2025-04-01 |
| publisher | BMC |
| record_format | Article |
| series | BMC Biology |
| spelling | doaj-art-59baed04f1c94afc977fdcb61edddcf22025-08-20T02:30:26ZengBMCBMC Biology1741-70072025-04-0123111710.1186/s12915-025-02200-3Instruction multi-constraint molecular generation using a teacher-student large language modelPeng Zhou0Jianmin Wang1Chunyan Li2Zixu Wang3Yiping Liu4Siqi Sun5Jianxin Lin6Leyi Wei7Xibao Cai8Houtim Lai9Wei Liu10Longyue Wang11Yuansheng Liu12Xiangxiang Zeng13College of Information Science and Engineering, Hunan UniversityThe Interdisciplinary Graduate Program in Integrative Biotechnology, Yonsei UniversitySchool of Informatics, Yunnan Normal UniversityDepartment of Computer Science, University of TsukubaCollege of Information Science and Engineering, Hunan UniversityResearch Institute of Intelligent Complex Systems, Fudan UniversityCollege of Information Science and Engineering, Hunan UniversityCentre for Artificial Intelligence Driven Drug Discovery, Faculty of Applied Science, Macao Polytechnic UniversityCollege of Information Science and Engineering, Hunan UniversityAI for Life Sciences Lab, TencentAI for Life Sciences Lab, TencentAlibaba International Digital CommerceCollege of Information Science and Engineering, Hunan UniversityCollege of Information Science and Engineering, Hunan UniversityAbstract Background While various models and computational tools have been proposed for structure and property analysis of molecules, generating molecules that conform to all desired structures and properties remains a challenge. Results We introduce a multi-constraint molecular generation large language model, TSMMG, which, akin to a student, incorporates knowledge from various small models and tools, namely, the “teachers.” To train TSMMG, we construct a large set of text-molecule pairs by extracting molecular knowledge from these “teachers,” enabling it to generate novel molecules that conform to the descriptions through various text prompts. We experimentally show that TSMMG remarkably performs in generating molecules that meet complex property requirements described in natural language across two-, three-, and four-constraint tasks, with an average molecular validity of over 99% and success ratio of 82.58%, 68.03%, and 67.48%, respectively. The model also exhibits adaptability through zero-shot testing, creating molecules that satisfy combinations of properties that have not been encountered. It can comprehend text inputs with various language styles, extending beyond the confines of outlined prompts. Conclusions TSMMG presents an effective model for multi-constraint molecular generation using natural language. This framework is not only applicable to drug discovery but also serves as a reference for other related fields.https://doi.org/10.1186/s12915-025-02200-3Molecular generationLarge language modelMulti-constraint |
| spellingShingle | Peng Zhou Jianmin Wang Chunyan Li Zixu Wang Yiping Liu Siqi Sun Jianxin Lin Leyi Wei Xibao Cai Houtim Lai Wei Liu Longyue Wang Yuansheng Liu Xiangxiang Zeng Instruction multi-constraint molecular generation using a teacher-student large language model BMC Biology Molecular generation Large language model Multi-constraint |
| title | Instruction multi-constraint molecular generation using a teacher-student large language model |
| title_full | Instruction multi-constraint molecular generation using a teacher-student large language model |
| title_fullStr | Instruction multi-constraint molecular generation using a teacher-student large language model |
| title_full_unstemmed | Instruction multi-constraint molecular generation using a teacher-student large language model |
| title_short | Instruction multi-constraint molecular generation using a teacher-student large language model |
| title_sort | instruction multi constraint molecular generation using a teacher student large language model |
| topic | Molecular generation Large language model Multi-constraint |
| url | https://doi.org/10.1186/s12915-025-02200-3 |
| work_keys_str_mv | AT pengzhou instructionmulticonstraintmoleculargenerationusingateacherstudentlargelanguagemodel AT jianminwang instructionmulticonstraintmoleculargenerationusingateacherstudentlargelanguagemodel AT chunyanli instructionmulticonstraintmoleculargenerationusingateacherstudentlargelanguagemodel AT zixuwang instructionmulticonstraintmoleculargenerationusingateacherstudentlargelanguagemodel AT yipingliu instructionmulticonstraintmoleculargenerationusingateacherstudentlargelanguagemodel AT siqisun instructionmulticonstraintmoleculargenerationusingateacherstudentlargelanguagemodel AT jianxinlin instructionmulticonstraintmoleculargenerationusingateacherstudentlargelanguagemodel AT leyiwei instructionmulticonstraintmoleculargenerationusingateacherstudentlargelanguagemodel AT xibaocai instructionmulticonstraintmoleculargenerationusingateacherstudentlargelanguagemodel AT houtimlai instructionmulticonstraintmoleculargenerationusingateacherstudentlargelanguagemodel AT weiliu instructionmulticonstraintmoleculargenerationusingateacherstudentlargelanguagemodel AT longyuewang instructionmulticonstraintmoleculargenerationusingateacherstudentlargelanguagemodel AT yuanshengliu instructionmulticonstraintmoleculargenerationusingateacherstudentlargelanguagemodel AT xiangxiangzeng instructionmulticonstraintmoleculargenerationusingateacherstudentlargelanguagemodel |