A FAIR Resource Recommender System for Smart Open Scientific Inquiries
A vast proportion of scientific data remains locked behind dynamic web interfaces, often called the deep web—inaccessible to conventional search engines and standard crawlers. This gap between data availability and machine usability hampers the goals of open science and automation. While registries...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-07-01
|
| Series: | Applied Sciences |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2076-3417/15/15/8334 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849407556394418176 |
|---|---|
| author | Syed N. Sakib Sajratul Y. Rubaiat Kallol Naha Hasan H. Rahman Hasan M. Jamil |
| author_facet | Syed N. Sakib Sajratul Y. Rubaiat Kallol Naha Hasan H. Rahman Hasan M. Jamil |
| author_sort | Syed N. Sakib |
| collection | DOAJ |
| description | A vast proportion of scientific data remains locked behind dynamic web interfaces, often called the deep web—inaccessible to conventional search engines and standard crawlers. This gap between data availability and machine usability hampers the goals of open science and automation. While registries like FAIRsharing offer structured metadata describing data standards, repositories, and policies aligned with the FAIR (Findable, Accessible, Interoperable, and Reusable) principles, they do not enable seamless, programmatic access to the underlying datasets. We present FAIRFind, a system designed to bridge this accessibility gap. FAIRFind autonomously discovers, interprets, and operationalizes access paths to biological databases on the deep web, regardless of their FAIR compliance. Central to our approach is the Deep Web Communication Protocol (DWCP), a resource description language that represents web forms, HyperText Markup Language (HTML) tables, and file-based data interfaces in a machine-actionable format. Leveraging large language models (LLMs), FAIRFind combines a specialized deep web crawler and web-form comprehension engine to transform passive web metadata into executable workflows. By indexing and embedding these workflows, FAIRFind enables natural language querying over diverse biological data sources and returns structured, source-resolved results. Evaluation across multiple open-source LLMs and database types demonstrates over 90% success in structured data extraction and high semantic retrieval accuracy. FAIRFind advances existing registries by turning linked resources from static references into actionable endpoints, laying a foundation for intelligent, autonomous data discovery across scientific domains. |
| format | Article |
| id | doaj-art-139eff601f04496997db945f74611583 |
| institution | Kabale University |
| issn | 2076-3417 |
| language | English |
| publishDate | 2025-07-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Applied Sciences |
| spelling | doaj-art-139eff601f04496997db945f746115832025-08-20T03:36:02ZengMDPI AGApplied Sciences2076-34172025-07-011515833410.3390/app15158334A FAIR Resource Recommender System for Smart Open Scientific InquiriesSyed N. Sakib0Sajratul Y. Rubaiat1Kallol Naha2Hasan H. Rahman3Hasan M. Jamil4Department of Computer Science, University of Idaho, Moscow, ID 83844, USADepartment of Computer Science, University of Idaho, Moscow, ID 83844, USADepartment of Computer Science, University of Idaho, Moscow, ID 83844, USADepartment of Computer Science, University of Idaho, Moscow, ID 83844, USADepartment of Computer Science, University of Idaho, Moscow, ID 83844, USAA vast proportion of scientific data remains locked behind dynamic web interfaces, often called the deep web—inaccessible to conventional search engines and standard crawlers. This gap between data availability and machine usability hampers the goals of open science and automation. While registries like FAIRsharing offer structured metadata describing data standards, repositories, and policies aligned with the FAIR (Findable, Accessible, Interoperable, and Reusable) principles, they do not enable seamless, programmatic access to the underlying datasets. We present FAIRFind, a system designed to bridge this accessibility gap. FAIRFind autonomously discovers, interprets, and operationalizes access paths to biological databases on the deep web, regardless of their FAIR compliance. Central to our approach is the Deep Web Communication Protocol (DWCP), a resource description language that represents web forms, HyperText Markup Language (HTML) tables, and file-based data interfaces in a machine-actionable format. Leveraging large language models (LLMs), FAIRFind combines a specialized deep web crawler and web-form comprehension engine to transform passive web metadata into executable workflows. By indexing and embedding these workflows, FAIRFind enables natural language querying over diverse biological data sources and returns structured, source-resolved results. Evaluation across multiple open-source LLMs and database types demonstrates over 90% success in structured data extraction and high semantic retrieval accuracy. FAIRFind advances existing registries by turning linked resources from static references into actionable endpoints, laying a foundation for intelligent, autonomous data discovery across scientific domains.https://www.mdpi.com/2076-3417/15/15/8334open sciencelarge language modelintelligent user interfaceFAIRrecommender systemlinked open data |
| spellingShingle | Syed N. Sakib Sajratul Y. Rubaiat Kallol Naha Hasan H. Rahman Hasan M. Jamil A FAIR Resource Recommender System for Smart Open Scientific Inquiries Applied Sciences open science large language model intelligent user interface FAIR recommender system linked open data |
| title | A FAIR Resource Recommender System for Smart Open Scientific Inquiries |
| title_full | A FAIR Resource Recommender System for Smart Open Scientific Inquiries |
| title_fullStr | A FAIR Resource Recommender System for Smart Open Scientific Inquiries |
| title_full_unstemmed | A FAIR Resource Recommender System for Smart Open Scientific Inquiries |
| title_short | A FAIR Resource Recommender System for Smart Open Scientific Inquiries |
| title_sort | fair resource recommender system for smart open scientific inquiries |
| topic | open science large language model intelligent user interface FAIR recommender system linked open data |
| url | https://www.mdpi.com/2076-3417/15/15/8334 |
| work_keys_str_mv | AT syednsakib afairresourcerecommendersystemforsmartopenscientificinquiries AT sajratulyrubaiat afairresourcerecommendersystemforsmartopenscientificinquiries AT kallolnaha afairresourcerecommendersystemforsmartopenscientificinquiries AT hasanhrahman afairresourcerecommendersystemforsmartopenscientificinquiries AT hasanmjamil afairresourcerecommendersystemforsmartopenscientificinquiries AT syednsakib fairresourcerecommendersystemforsmartopenscientificinquiries AT sajratulyrubaiat fairresourcerecommendersystemforsmartopenscientificinquiries AT kallolnaha fairresourcerecommendersystemforsmartopenscientificinquiries AT hasanhrahman fairresourcerecommendersystemforsmartopenscientificinquiries AT hasanmjamil fairresourcerecommendersystemforsmartopenscientificinquiries |