A FAIR Resource Recommender System for Smart Open Scientific Inquiries

A vast proportion of scientific data remains locked behind dynamic web interfaces, often called the deep web—inaccessible to conventional search engines and standard crawlers. This gap between data availability and machine usability hampers the goals of open science and automation. While registries...

Full description

Saved in:
Bibliographic Details
Main Authors: Syed N. Sakib, Sajratul Y. Rubaiat, Kallol Naha, Hasan H. Rahman, Hasan M. Jamil
Format: Article
Language:English
Published: MDPI AG 2025-07-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/15/8334
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849407556394418176
author Syed N. Sakib
Sajratul Y. Rubaiat
Kallol Naha
Hasan H. Rahman
Hasan M. Jamil
author_facet Syed N. Sakib
Sajratul Y. Rubaiat
Kallol Naha
Hasan H. Rahman
Hasan M. Jamil
author_sort Syed N. Sakib
collection DOAJ
description A vast proportion of scientific data remains locked behind dynamic web interfaces, often called the deep web—inaccessible to conventional search engines and standard crawlers. This gap between data availability and machine usability hampers the goals of open science and automation. While registries like FAIRsharing offer structured metadata describing data standards, repositories, and policies aligned with the FAIR (Findable, Accessible, Interoperable, and Reusable) principles, they do not enable seamless, programmatic access to the underlying datasets. We present FAIRFind, a system designed to bridge this accessibility gap. FAIRFind autonomously discovers, interprets, and operationalizes access paths to biological databases on the deep web, regardless of their FAIR compliance. Central to our approach is the Deep Web Communication Protocol (DWCP), a resource description language that represents web forms, HyperText Markup Language (HTML) tables, and file-based data interfaces in a machine-actionable format. Leveraging large language models (LLMs), FAIRFind combines a specialized deep web crawler and web-form comprehension engine to transform passive web metadata into executable workflows. By indexing and embedding these workflows, FAIRFind enables natural language querying over diverse biological data sources and returns structured, source-resolved results. Evaluation across multiple open-source LLMs and database types demonstrates over 90% success in structured data extraction and high semantic retrieval accuracy. FAIRFind advances existing registries by turning linked resources from static references into actionable endpoints, laying a foundation for intelligent, autonomous data discovery across scientific domains.
format Article
id doaj-art-139eff601f04496997db945f74611583
institution Kabale University
issn 2076-3417
language English
publishDate 2025-07-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj-art-139eff601f04496997db945f746115832025-08-20T03:36:02ZengMDPI AGApplied Sciences2076-34172025-07-011515833410.3390/app15158334A FAIR Resource Recommender System for Smart Open Scientific InquiriesSyed N. Sakib0Sajratul Y. Rubaiat1Kallol Naha2Hasan H. Rahman3Hasan M. Jamil4Department of Computer Science, University of Idaho, Moscow, ID 83844, USADepartment of Computer Science, University of Idaho, Moscow, ID 83844, USADepartment of Computer Science, University of Idaho, Moscow, ID 83844, USADepartment of Computer Science, University of Idaho, Moscow, ID 83844, USADepartment of Computer Science, University of Idaho, Moscow, ID 83844, USAA vast proportion of scientific data remains locked behind dynamic web interfaces, often called the deep web—inaccessible to conventional search engines and standard crawlers. This gap between data availability and machine usability hampers the goals of open science and automation. While registries like FAIRsharing offer structured metadata describing data standards, repositories, and policies aligned with the FAIR (Findable, Accessible, Interoperable, and Reusable) principles, they do not enable seamless, programmatic access to the underlying datasets. We present FAIRFind, a system designed to bridge this accessibility gap. FAIRFind autonomously discovers, interprets, and operationalizes access paths to biological databases on the deep web, regardless of their FAIR compliance. Central to our approach is the Deep Web Communication Protocol (DWCP), a resource description language that represents web forms, HyperText Markup Language (HTML) tables, and file-based data interfaces in a machine-actionable format. Leveraging large language models (LLMs), FAIRFind combines a specialized deep web crawler and web-form comprehension engine to transform passive web metadata into executable workflows. By indexing and embedding these workflows, FAIRFind enables natural language querying over diverse biological data sources and returns structured, source-resolved results. Evaluation across multiple open-source LLMs and database types demonstrates over 90% success in structured data extraction and high semantic retrieval accuracy. FAIRFind advances existing registries by turning linked resources from static references into actionable endpoints, laying a foundation for intelligent, autonomous data discovery across scientific domains.https://www.mdpi.com/2076-3417/15/15/8334open sciencelarge language modelintelligent user interfaceFAIRrecommender systemlinked open data
spellingShingle Syed N. Sakib
Sajratul Y. Rubaiat
Kallol Naha
Hasan H. Rahman
Hasan M. Jamil
A FAIR Resource Recommender System for Smart Open Scientific Inquiries
Applied Sciences
open science
large language model
intelligent user interface
FAIR
recommender system
linked open data
title A FAIR Resource Recommender System for Smart Open Scientific Inquiries
title_full A FAIR Resource Recommender System for Smart Open Scientific Inquiries
title_fullStr A FAIR Resource Recommender System for Smart Open Scientific Inquiries
title_full_unstemmed A FAIR Resource Recommender System for Smart Open Scientific Inquiries
title_short A FAIR Resource Recommender System for Smart Open Scientific Inquiries
title_sort fair resource recommender system for smart open scientific inquiries
topic open science
large language model
intelligent user interface
FAIR
recommender system
linked open data
url https://www.mdpi.com/2076-3417/15/15/8334
work_keys_str_mv AT syednsakib afairresourcerecommendersystemforsmartopenscientificinquiries
AT sajratulyrubaiat afairresourcerecommendersystemforsmartopenscientificinquiries
AT kallolnaha afairresourcerecommendersystemforsmartopenscientificinquiries
AT hasanhrahman afairresourcerecommendersystemforsmartopenscientificinquiries
AT hasanmjamil afairresourcerecommendersystemforsmartopenscientificinquiries
AT syednsakib fairresourcerecommendersystemforsmartopenscientificinquiries
AT sajratulyrubaiat fairresourcerecommendersystemforsmartopenscientificinquiries
AT kallolnaha fairresourcerecommendersystemforsmartopenscientificinquiries
AT hasanhrahman fairresourcerecommendersystemforsmartopenscientificinquiries
AT hasanmjamil fairresourcerecommendersystemforsmartopenscientificinquiries