Publishing neural networks in drug discovery might compromise training data privacy
Abstract This study investigates the risks of exposing confidential chemical structures when machine learning models trained on these structures are made publicly available. We use membership inference attacks, a common method to assess privacy that is largely unexplored in the context of drug disco...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
BMC
2025-03-01
|
| Series: | Journal of Cheminformatics |
| Subjects: | |
| Online Access: | https://doi.org/10.1186/s13321-025-00982-w |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850065064658206720 |
|---|---|
| author | Fabian P. Krüger Johan Östman Lewis Mervin Igor V. Tetko Ola Engkvist |
| author_facet | Fabian P. Krüger Johan Östman Lewis Mervin Igor V. Tetko Ola Engkvist |
| author_sort | Fabian P. Krüger |
| collection | DOAJ |
| description | Abstract This study investigates the risks of exposing confidential chemical structures when machine learning models trained on these structures are made publicly available. We use membership inference attacks, a common method to assess privacy that is largely unexplored in the context of drug discovery, to examine neural networks for molecular property prediction in a black-box setting. Our results reveal significant privacy risks across all evaluated datasets and neural network architectures. Combining multiple attacks increases these risks. Molecules from minority classes, often the most valuable in drug discovery, are particularly vulnerable. We also found that representing molecules as graphs and using message-passing neural networks may mitigate these risks. We provide a framework to assess privacy risks of classification models and molecular representations, available at https://github.com/FabianKruger/molprivacy . Our findings highlight the need for careful consideration when sharing neural networks trained on proprietary chemical structures, informing organisations and researchers about the trade-offs between data confidentiality and model openness. |
| format | Article |
| id | doaj-art-a6cea2c2b9ff4b3a92d2826ab16039e8 |
| institution | DOAJ |
| issn | 1758-2946 |
| language | English |
| publishDate | 2025-03-01 |
| publisher | BMC |
| record_format | Article |
| series | Journal of Cheminformatics |
| spelling | doaj-art-a6cea2c2b9ff4b3a92d2826ab16039e82025-08-20T02:49:06ZengBMCJournal of Cheminformatics1758-29462025-03-0117111710.1186/s13321-025-00982-wPublishing neural networks in drug discovery might compromise training data privacyFabian P. Krüger0Johan Östman1Lewis Mervin2Igor V. Tetko3Ola Engkvist4Discovery Sciences, Molecular AI, AstraZeneca R&DAI SwedenDiscovery Sciences, Molecular AI, AstraZeneca R&DMolecular Targets and Therapeutics Center, Institute of Structural Biology, Helmholtz Munich - Deutsches Forschungszentrum Für Gesundheit Und Umwelt (GmbH)Discovery Sciences, Molecular AI, AstraZeneca R&DAbstract This study investigates the risks of exposing confidential chemical structures when machine learning models trained on these structures are made publicly available. We use membership inference attacks, a common method to assess privacy that is largely unexplored in the context of drug discovery, to examine neural networks for molecular property prediction in a black-box setting. Our results reveal significant privacy risks across all evaluated datasets and neural network architectures. Combining multiple attacks increases these risks. Molecules from minority classes, often the most valuable in drug discovery, are particularly vulnerable. We also found that representing molecules as graphs and using message-passing neural networks may mitigate these risks. We provide a framework to assess privacy risks of classification models and molecular representations, available at https://github.com/FabianKruger/molprivacy . Our findings highlight the need for careful consideration when sharing neural networks trained on proprietary chemical structures, informing organisations and researchers about the trade-offs between data confidentiality and model openness.https://doi.org/10.1186/s13321-025-00982-wMembership inference attackPrivacyDrug discoveryCheminformaticsQSARMachine learning |
| spellingShingle | Fabian P. Krüger Johan Östman Lewis Mervin Igor V. Tetko Ola Engkvist Publishing neural networks in drug discovery might compromise training data privacy Journal of Cheminformatics Membership inference attack Privacy Drug discovery Cheminformatics QSAR Machine learning |
| title | Publishing neural networks in drug discovery might compromise training data privacy |
| title_full | Publishing neural networks in drug discovery might compromise training data privacy |
| title_fullStr | Publishing neural networks in drug discovery might compromise training data privacy |
| title_full_unstemmed | Publishing neural networks in drug discovery might compromise training data privacy |
| title_short | Publishing neural networks in drug discovery might compromise training data privacy |
| title_sort | publishing neural networks in drug discovery might compromise training data privacy |
| topic | Membership inference attack Privacy Drug discovery Cheminformatics QSAR Machine learning |
| url | https://doi.org/10.1186/s13321-025-00982-w |
| work_keys_str_mv | AT fabianpkruger publishingneuralnetworksindrugdiscoverymightcompromisetrainingdataprivacy AT johanostman publishingneuralnetworksindrugdiscoverymightcompromisetrainingdataprivacy AT lewismervin publishingneuralnetworksindrugdiscoverymightcompromisetrainingdataprivacy AT igorvtetko publishingneuralnetworksindrugdiscoverymightcompromisetrainingdataprivacy AT olaengkvist publishingneuralnetworksindrugdiscoverymightcompromisetrainingdataprivacy |