Natural language access point to digital metal–organic polyhedra chemistry in The World Avatar

Metal–organic polyhedra (MOPs) are discrete, porous metal–organic assemblies known for their wide-ranging applications in separation, drug delivery, and catalysis. As part of The World Avatar (TWA) project—a universal and interoperable knowledge model—we have previously systematized known MOPs and e...

Full description

Saved in:
Bibliographic Details
Main Authors: Simon D. Rihm, Dan N. Tran, Aleksandar Kondinski, Laura Pascazio, Fabio Saluz, Xinhong Deng, Sebastian Mosbach, Jethro Akroyd, Markus Kraft
Format: Article
Language:English
Published: Cambridge University Press 2025-01-01
Series:Data-Centric Engineering
Subjects:
Online Access:https://www.cambridge.org/core/product/identifier/S2632673625000127/type/journal_article
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850069208336957440
author Simon D. Rihm
Dan N. Tran
Aleksandar Kondinski
Laura Pascazio
Fabio Saluz
Xinhong Deng
Sebastian Mosbach
Jethro Akroyd
Markus Kraft
author_facet Simon D. Rihm
Dan N. Tran
Aleksandar Kondinski
Laura Pascazio
Fabio Saluz
Xinhong Deng
Sebastian Mosbach
Jethro Akroyd
Markus Kraft
author_sort Simon D. Rihm
collection DOAJ
description Metal–organic polyhedra (MOPs) are discrete, porous metal–organic assemblies known for their wide-ranging applications in separation, drug delivery, and catalysis. As part of The World Avatar (TWA) project—a universal and interoperable knowledge model—we have previously systematized known MOPs and expanded the explorable MOP space with novel targets. Although these data are available via a complex query language, a more user-friendly interface is desirable to enhance accessibility. To address a similar challenge in other chemistry domains, the natural language question-answering system “Marie” has been developed; however, its scalability is limited due to its reliance on supervised fine-tuning, which hinders its adaptability to new knowledge domains. In this article, we introduce an enhanced database of MOPs and a first-of-its-kind question-answering system tailored for MOP chemistry. By augmenting TWA’s MOP database with geometry data, we enable the visualization of not just empirically verified MOP structures but also machine-predicted ones. In addition, we renovated Marie’s semantic parser to adopt in-context few-shot learning, allowing seamless interaction with TWA’s extensive MOP repository. These advancements significantly improve the accessibility and versatility of TWA, marking an important step toward accelerating and automating the development of reticular materials with the aid of digital assistants.
format Article
id doaj-art-c3fe60ec4e65461994783e9eba5ea0b3
institution DOAJ
issn 2632-6736
language English
publishDate 2025-01-01
publisher Cambridge University Press
record_format Article
series Data-Centric Engineering
spelling doaj-art-c3fe60ec4e65461994783e9eba5ea0b32025-08-20T02:47:49ZengCambridge University PressData-Centric Engineering2632-67362025-01-01610.1017/dce.2025.12Natural language access point to digital metal–organic polyhedra chemistry in The World AvatarSimon D. Rihm0https://orcid.org/0000-0001-8342-7269Dan N. Tran1https://orcid.org/0000-0002-8980-7200Aleksandar Kondinski2https://orcid.org/0000-0002-0559-0172Laura Pascazio3https://orcid.org/0000-0003-4084-995XFabio Saluz4Xinhong Deng5Sebastian Mosbach6https://orcid.org/0000-0001-7018-9433Jethro Akroyd7https://orcid.org/0000-0002-2143-8656Markus Kraft8https://orcid.org/0000-0002-4293-8924Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge, UKCARES, Cambridge Centre for Advanced Research and Education in Singapore, Singapore, SingaporeDepartment of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge, UKCARES, Cambridge Centre for Advanced Research and Education in Singapore, Singapore, SingaporeDepartment of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge, UK Department of Mechanical and Process Engineering, ETH Zurich, Zurich, SwitzerlandCARES, Cambridge Centre for Advanced Research and Education in Singapore, Singapore, SingaporeDepartment of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge, UK CARES, Cambridge Centre for Advanced Research and Education in Singapore, Singapore, Singapore CMCL, Cambridge, UKDepartment of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge, UK CARES, Cambridge Centre for Advanced Research and Education in Singapore, Singapore, Singapore CMCL, Cambridge, UKDepartment of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge, UK CARES, Cambridge Centre for Advanced Research and Education in Singapore, Singapore, Singapore CMCL, Cambridge, UKMetal–organic polyhedra (MOPs) are discrete, porous metal–organic assemblies known for their wide-ranging applications in separation, drug delivery, and catalysis. As part of The World Avatar (TWA) project—a universal and interoperable knowledge model—we have previously systematized known MOPs and expanded the explorable MOP space with novel targets. Although these data are available via a complex query language, a more user-friendly interface is desirable to enhance accessibility. To address a similar challenge in other chemistry domains, the natural language question-answering system “Marie” has been developed; however, its scalability is limited due to its reliance on supervised fine-tuning, which hinders its adaptability to new knowledge domains. In this article, we introduce an enhanced database of MOPs and a first-of-its-kind question-answering system tailored for MOP chemistry. By augmenting TWA’s MOP database with geometry data, we enable the visualization of not just empirically verified MOP structures but also machine-predicted ones. In addition, we renovated Marie’s semantic parser to adopt in-context few-shot learning, allowing seamless interaction with TWA’s extensive MOP repository. These advancements significantly improve the accessibility and versatility of TWA, marking an important step toward accelerating and automating the development of reticular materials with the aid of digital assistants.https://www.cambridge.org/core/product/identifier/S2632673625000127/type/journal_articledynamic knowledge graphsmetal–organic polyhedraquestion-answering systemsretrieval-augmented generation
spellingShingle Simon D. Rihm
Dan N. Tran
Aleksandar Kondinski
Laura Pascazio
Fabio Saluz
Xinhong Deng
Sebastian Mosbach
Jethro Akroyd
Markus Kraft
Natural language access point to digital metal–organic polyhedra chemistry in The World Avatar
Data-Centric Engineering
dynamic knowledge graphs
metal–organic polyhedra
question-answering systems
retrieval-augmented generation
title Natural language access point to digital metal–organic polyhedra chemistry in The World Avatar
title_full Natural language access point to digital metal–organic polyhedra chemistry in The World Avatar
title_fullStr Natural language access point to digital metal–organic polyhedra chemistry in The World Avatar
title_full_unstemmed Natural language access point to digital metal–organic polyhedra chemistry in The World Avatar
title_short Natural language access point to digital metal–organic polyhedra chemistry in The World Avatar
title_sort natural language access point to digital metal organic polyhedra chemistry in the world avatar
topic dynamic knowledge graphs
metal–organic polyhedra
question-answering systems
retrieval-augmented generation
url https://www.cambridge.org/core/product/identifier/S2632673625000127/type/journal_article
work_keys_str_mv AT simondrihm naturallanguageaccesspointtodigitalmetalorganicpolyhedrachemistryintheworldavatar
AT danntran naturallanguageaccesspointtodigitalmetalorganicpolyhedrachemistryintheworldavatar
AT aleksandarkondinski naturallanguageaccesspointtodigitalmetalorganicpolyhedrachemistryintheworldavatar
AT laurapascazio naturallanguageaccesspointtodigitalmetalorganicpolyhedrachemistryintheworldavatar
AT fabiosaluz naturallanguageaccesspointtodigitalmetalorganicpolyhedrachemistryintheworldavatar
AT xinhongdeng naturallanguageaccesspointtodigitalmetalorganicpolyhedrachemistryintheworldavatar
AT sebastianmosbach naturallanguageaccesspointtodigitalmetalorganicpolyhedrachemistryintheworldavatar
AT jethroakroyd naturallanguageaccesspointtodigitalmetalorganicpolyhedrachemistryintheworldavatar
AT markuskraft naturallanguageaccesspointtodigitalmetalorganicpolyhedrachemistryintheworldavatar