Generating diversity and securing completeness in algorithmic retrosynthesis
Abstract Chemical synthesis planning has considerably benefited from advances in the field of machine learning. Neural networks can reliably and accurately predict reactions leading to a given, possibly complex, molecule. In this work we focus on algorithms for assembling such predictions to a full...
Saved in:
| Main Authors: | , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
BMC
2025-05-01
|
| Series: | Journal of Cheminformatics |
| Subjects: | |
| Online Access: | https://doi.org/10.1186/s13321-025-00981-x |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850154746087735296 |
|---|---|
| author | Florian Mrugalla Christopher Franz Yannic Alber Georg Mogk Martín Villalba Thomas Mrziglod Kevin Schewior |
| author_facet | Florian Mrugalla Christopher Franz Yannic Alber Georg Mogk Martín Villalba Thomas Mrziglod Kevin Schewior |
| author_sort | Florian Mrugalla |
| collection | DOAJ |
| description | Abstract Chemical synthesis planning has considerably benefited from advances in the field of machine learning. Neural networks can reliably and accurately predict reactions leading to a given, possibly complex, molecule. In this work we focus on algorithms for assembling such predictions to a full synthesis plan that, starting from simple building blocks, produces a given target molecule, a procedure known as retrosynthesis. Objective functions for this task are hard to define and context-specific. In order to generate a diverse set of synthesis plans for chemists to select from, we capture the concept of diversity in a novel chemical diversity score (CDS). Our experiments show that our algorithm outperforms the algorithm predominantly employed in this domain, Monte-Carlo Tree Search, with respect to diversity in terms of our score as well as time efficiency. Scientific Contribution: We adapt Depth-First Proof-Number Search (DFPN) (Please refer to https://github.com/Bayer-Group/bayer-retrosynthesis-search for the accompanying source code.) and its variants, which have been applied to retrosynthesis before, to produce a set of solutions, with an explicit focus on diversity. We also make progress on understanding DFPN in terms of completeness, i.e., the ability to find a solution whenever there exists one. DFPN is known to be incomplete, for which we provide a much cleaner example, but we also show that it is complete when reinforced with a threshold-controlling routine from the literature. |
| format | Article |
| id | doaj-art-72fa2cecb7fa4af6852517867ee68eaa |
| institution | OA Journals |
| issn | 1758-2946 |
| language | English |
| publishDate | 2025-05-01 |
| publisher | BMC |
| record_format | Article |
| series | Journal of Cheminformatics |
| spelling | doaj-art-72fa2cecb7fa4af6852517867ee68eaa2025-08-20T02:25:12ZengBMCJournal of Cheminformatics1758-29462025-05-0117111910.1186/s13321-025-00981-xGenerating diversity and securing completeness in algorithmic retrosynthesisFlorian Mrugalla0Christopher FranzYannic Alber1Georg Mogk2Martín VillalbaThomas Mrziglod3Kevin Schewior4Bayer AGBayer AGBayer AGBayer AGDepartment of Mathematics and Computer Science, University of CologneAbstract Chemical synthesis planning has considerably benefited from advances in the field of machine learning. Neural networks can reliably and accurately predict reactions leading to a given, possibly complex, molecule. In this work we focus on algorithms for assembling such predictions to a full synthesis plan that, starting from simple building blocks, produces a given target molecule, a procedure known as retrosynthesis. Objective functions for this task are hard to define and context-specific. In order to generate a diverse set of synthesis plans for chemists to select from, we capture the concept of diversity in a novel chemical diversity score (CDS). Our experiments show that our algorithm outperforms the algorithm predominantly employed in this domain, Monte-Carlo Tree Search, with respect to diversity in terms of our score as well as time efficiency. Scientific Contribution: We adapt Depth-First Proof-Number Search (DFPN) (Please refer to https://github.com/Bayer-Group/bayer-retrosynthesis-search for the accompanying source code.) and its variants, which have been applied to retrosynthesis before, to produce a set of solutions, with an explicit focus on diversity. We also make progress on understanding DFPN in terms of completeness, i.e., the ability to find a solution whenever there exists one. DFPN is known to be incomplete, for which we provide a much cleaner example, but we also show that it is complete when reinforced with a threshold-controlling routine from the literature.https://doi.org/10.1186/s13321-025-00981-xComputer-Assisted Synthesis Planning (CASP)RetrosynthesisDFPNChemical diversity score |
| spellingShingle | Florian Mrugalla Christopher Franz Yannic Alber Georg Mogk Martín Villalba Thomas Mrziglod Kevin Schewior Generating diversity and securing completeness in algorithmic retrosynthesis Journal of Cheminformatics Computer-Assisted Synthesis Planning (CASP) Retrosynthesis DFPN Chemical diversity score |
| title | Generating diversity and securing completeness in algorithmic retrosynthesis |
| title_full | Generating diversity and securing completeness in algorithmic retrosynthesis |
| title_fullStr | Generating diversity and securing completeness in algorithmic retrosynthesis |
| title_full_unstemmed | Generating diversity and securing completeness in algorithmic retrosynthesis |
| title_short | Generating diversity and securing completeness in algorithmic retrosynthesis |
| title_sort | generating diversity and securing completeness in algorithmic retrosynthesis |
| topic | Computer-Assisted Synthesis Planning (CASP) Retrosynthesis DFPN Chemical diversity score |
| url | https://doi.org/10.1186/s13321-025-00981-x |
| work_keys_str_mv | AT florianmrugalla generatingdiversityandsecuringcompletenessinalgorithmicretrosynthesis AT christopherfranz generatingdiversityandsecuringcompletenessinalgorithmicretrosynthesis AT yannicalber generatingdiversityandsecuringcompletenessinalgorithmicretrosynthesis AT georgmogk generatingdiversityandsecuringcompletenessinalgorithmicretrosynthesis AT martinvillalba generatingdiversityandsecuringcompletenessinalgorithmicretrosynthesis AT thomasmrziglod generatingdiversityandsecuringcompletenessinalgorithmicretrosynthesis AT kevinschewior generatingdiversityandsecuringcompletenessinalgorithmicretrosynthesis |