Knowledge Distillation for Molecular Property Prediction: A Scalability Analysis

Abstract Knowledge distillation (KD) is a powerful model compression technique that transfers knowledge from complex teacher models to compact student models, reducing computational costs while preserving predictive accuracy. This study investigated KD's efficacy in molecular property predictio...

Full description

Saved in:

Bibliographic Details
Main Authors:	Rahul Sheshanarayana, Fengqi You
Format:	Article
Language:	English
Published:	Wiley 2025-06-01
Series:	Advanced Science
Subjects:	graph neural networks knowledge distillation materials informatics scalability
Online Access:	https://doi.org/10.1002/advs.202503271
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849336973955694592
author	Rahul Sheshanarayana Fengqi You
author_facet	Rahul Sheshanarayana Fengqi You
author_sort	Rahul Sheshanarayana
collection	DOAJ
description	Abstract Knowledge distillation (KD) is a powerful model compression technique that transfers knowledge from complex teacher models to compact student models, reducing computational costs while preserving predictive accuracy. This study investigated KD's efficacy in molecular property prediction across domain‐specific and cross‐domain tasks, leveraging state‐of‐the‐art graph neural networks (SchNet, DimeNet++, and TensorNet). In the domain‐specific setting, KD improved regression performance across diverse quantum mechanical properties in the QM9 dataset, with DimeNet++ student models achieving up to an 90% improvement in R2 compared to non‐KD baselines. Notably, in certain cases, smaller student models achieved comparable or even superior R2 improvements while being 2× smaller, highlighting KD's ability to enhance efficiency without sacrificing predictive performance. Cross‐domain evaluations further demonstrated KD's adaptability, where embeddings from QM9‐trained teacher models enhanced predictions for ESOL (logS) and FreeSolv (ΔGhyd), with SchNet exhibiting the highest gains of ≈65% in logS predictions. Embedding analysis revealed substantial student‐teacher alignment gains, with the relative shift in cosine similarity distribution peaks reaching up to 1.0 across student models. These findings highlighted KD as a robust strategy for enhancing molecular representation learning, with implications for cheminformatics, materials science, and drug discovery.
format	Article
id	doaj-art-8c2faf5d0fd343b9a7f3aa23629e0534
institution	Kabale University
issn	2198-3844
language	English
publishDate	2025-06-01
publisher	Wiley
record_format	Article
series	Advanced Science
spelling	doaj-art-8c2faf5d0fd343b9a7f3aa23629e05342025-08-20T03:44:51ZengWileyAdvanced Science2198-38442025-06-011222n/an/a10.1002/advs.202503271Knowledge Distillation for Molecular Property Prediction: A Scalability AnalysisRahul Sheshanarayana0Fengqi You1College of EngineeringCornell UniversityIthaca NY 14853 USACollege of EngineeringCornell UniversityIthaca NY 14853 USAAbstract Knowledge distillation (KD) is a powerful model compression technique that transfers knowledge from complex teacher models to compact student models, reducing computational costs while preserving predictive accuracy. This study investigated KD's efficacy in molecular property prediction across domain‐specific and cross‐domain tasks, leveraging state‐of‐the‐art graph neural networks (SchNet, DimeNet++, and TensorNet). In the domain‐specific setting, KD improved regression performance across diverse quantum mechanical properties in the QM9 dataset, with DimeNet++ student models achieving up to an 90% improvement in R2 compared to non‐KD baselines. Notably, in certain cases, smaller student models achieved comparable or even superior R2 improvements while being 2× smaller, highlighting KD's ability to enhance efficiency without sacrificing predictive performance. Cross‐domain evaluations further demonstrated KD's adaptability, where embeddings from QM9‐trained teacher models enhanced predictions for ESOL (logS) and FreeSolv (ΔGhyd), with SchNet exhibiting the highest gains of ≈65% in logS predictions. Embedding analysis revealed substantial student‐teacher alignment gains, with the relative shift in cosine similarity distribution peaks reaching up to 1.0 across student models. These findings highlighted KD as a robust strategy for enhancing molecular representation learning, with implications for cheminformatics, materials science, and drug discovery.https://doi.org/10.1002/advs.202503271graph neural networksknowledge distillationmaterials informaticsscalability
spellingShingle	Rahul Sheshanarayana Fengqi You Knowledge Distillation for Molecular Property Prediction: A Scalability Analysis Advanced Science graph neural networks knowledge distillation materials informatics scalability
title	Knowledge Distillation for Molecular Property Prediction: A Scalability Analysis
title_full	Knowledge Distillation for Molecular Property Prediction: A Scalability Analysis
title_fullStr	Knowledge Distillation for Molecular Property Prediction: A Scalability Analysis
title_full_unstemmed	Knowledge Distillation for Molecular Property Prediction: A Scalability Analysis
title_short	Knowledge Distillation for Molecular Property Prediction: A Scalability Analysis
title_sort	knowledge distillation for molecular property prediction a scalability analysis
topic	graph neural networks knowledge distillation materials informatics scalability
url	https://doi.org/10.1002/advs.202503271
work_keys_str_mv	AT rahulsheshanarayana knowledgedistillationformolecularpropertypredictionascalabilityanalysis AT fengqiyou knowledgedistillationformolecularpropertypredictionascalabilityanalysis

Knowledge Distillation for Molecular Property Prediction: A Scalability Analysis

Similar Items