Generating diversity and securing completeness in algorithmic retrosynthesis

Abstract Chemical synthesis planning has considerably benefited from advances in the field of machine learning. Neural networks can reliably and accurately predict reactions leading to a given, possibly complex, molecule. In this work we focus on algorithms for assembling such predictions to a full...

Full description

Saved in:
Bibliographic Details
Main Authors: Florian Mrugalla, Christopher Franz, Yannic Alber, Georg Mogk, Martín Villalba, Thomas Mrziglod, Kevin Schewior
Format: Article
Language:English
Published: BMC 2025-05-01
Series:Journal of Cheminformatics
Subjects:
Online Access:https://doi.org/10.1186/s13321-025-00981-x
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850154746087735296
author Florian Mrugalla
Christopher Franz
Yannic Alber
Georg Mogk
Martín Villalba
Thomas Mrziglod
Kevin Schewior
author_facet Florian Mrugalla
Christopher Franz
Yannic Alber
Georg Mogk
Martín Villalba
Thomas Mrziglod
Kevin Schewior
author_sort Florian Mrugalla
collection DOAJ
description Abstract Chemical synthesis planning has considerably benefited from advances in the field of machine learning. Neural networks can reliably and accurately predict reactions leading to a given, possibly complex, molecule. In this work we focus on algorithms for assembling such predictions to a full synthesis plan that, starting from simple building blocks, produces a given target molecule, a procedure known as retrosynthesis. Objective functions for this task are hard to define and context-specific. In order to generate a diverse set of synthesis plans for chemists to select from, we capture the concept of diversity in a novel chemical diversity score (CDS). Our experiments show that our algorithm outperforms the algorithm predominantly employed in this domain, Monte-Carlo Tree Search, with respect to diversity in terms of our score as well as time efficiency. Scientific Contribution: We adapt Depth-First Proof-Number Search (DFPN) (Please refer to https://github.com/Bayer-Group/bayer-retrosynthesis-search for the accompanying source code.) and its variants, which have been applied to retrosynthesis before, to produce a set of solutions, with an explicit focus on diversity. We also make progress on understanding DFPN in terms of completeness, i.e., the ability to find a solution whenever there exists one. DFPN is known to be incomplete, for which we provide a much cleaner example, but we also show that it is complete when reinforced with a threshold-controlling routine from the literature.
format Article
id doaj-art-72fa2cecb7fa4af6852517867ee68eaa
institution OA Journals
issn 1758-2946
language English
publishDate 2025-05-01
publisher BMC
record_format Article
series Journal of Cheminformatics
spelling doaj-art-72fa2cecb7fa4af6852517867ee68eaa2025-08-20T02:25:12ZengBMCJournal of Cheminformatics1758-29462025-05-0117111910.1186/s13321-025-00981-xGenerating diversity and securing completeness in algorithmic retrosynthesisFlorian Mrugalla0Christopher FranzYannic Alber1Georg Mogk2Martín VillalbaThomas Mrziglod3Kevin Schewior4Bayer AGBayer AGBayer AGBayer AGDepartment of Mathematics and Computer Science, University of CologneAbstract Chemical synthesis planning has considerably benefited from advances in the field of machine learning. Neural networks can reliably and accurately predict reactions leading to a given, possibly complex, molecule. In this work we focus on algorithms for assembling such predictions to a full synthesis plan that, starting from simple building blocks, produces a given target molecule, a procedure known as retrosynthesis. Objective functions for this task are hard to define and context-specific. In order to generate a diverse set of synthesis plans for chemists to select from, we capture the concept of diversity in a novel chemical diversity score (CDS). Our experiments show that our algorithm outperforms the algorithm predominantly employed in this domain, Monte-Carlo Tree Search, with respect to diversity in terms of our score as well as time efficiency. Scientific Contribution: We adapt Depth-First Proof-Number Search (DFPN) (Please refer to https://github.com/Bayer-Group/bayer-retrosynthesis-search for the accompanying source code.) and its variants, which have been applied to retrosynthesis before, to produce a set of solutions, with an explicit focus on diversity. We also make progress on understanding DFPN in terms of completeness, i.e., the ability to find a solution whenever there exists one. DFPN is known to be incomplete, for which we provide a much cleaner example, but we also show that it is complete when reinforced with a threshold-controlling routine from the literature.https://doi.org/10.1186/s13321-025-00981-xComputer-Assisted Synthesis Planning (CASP)RetrosynthesisDFPNChemical diversity score
spellingShingle Florian Mrugalla
Christopher Franz
Yannic Alber
Georg Mogk
Martín Villalba
Thomas Mrziglod
Kevin Schewior
Generating diversity and securing completeness in algorithmic retrosynthesis
Journal of Cheminformatics
Computer-Assisted Synthesis Planning (CASP)
Retrosynthesis
DFPN
Chemical diversity score
title Generating diversity and securing completeness in algorithmic retrosynthesis
title_full Generating diversity and securing completeness in algorithmic retrosynthesis
title_fullStr Generating diversity and securing completeness in algorithmic retrosynthesis
title_full_unstemmed Generating diversity and securing completeness in algorithmic retrosynthesis
title_short Generating diversity and securing completeness in algorithmic retrosynthesis
title_sort generating diversity and securing completeness in algorithmic retrosynthesis
topic Computer-Assisted Synthesis Planning (CASP)
Retrosynthesis
DFPN
Chemical diversity score
url https://doi.org/10.1186/s13321-025-00981-x
work_keys_str_mv AT florianmrugalla generatingdiversityandsecuringcompletenessinalgorithmicretrosynthesis
AT christopherfranz generatingdiversityandsecuringcompletenessinalgorithmicretrosynthesis
AT yannicalber generatingdiversityandsecuringcompletenessinalgorithmicretrosynthesis
AT georgmogk generatingdiversityandsecuringcompletenessinalgorithmicretrosynthesis
AT martinvillalba generatingdiversityandsecuringcompletenessinalgorithmicretrosynthesis
AT thomasmrziglod generatingdiversityandsecuringcompletenessinalgorithmicretrosynthesis
AT kevinschewior generatingdiversityandsecuringcompletenessinalgorithmicretrosynthesis