Data cleaning and enrichment through data integration: networking the Italian academia

Abstract We describe a bibliometric network characterizing co-authorship collaborations in the entire Italian academic community. The network, consisting of 38,220 nodes and 507,050 edges, is built upon two distinct data sources: faculty information provided by the Italian Ministry of University and...

Full description

Saved in:
Bibliographic Details
Main Authors: Irene Finocchi, Alessio Martino, Fariba Ranjbar, Blerina Sinaimeri
Format: Article
Language:English
Published: Nature Portfolio 2025-02-01
Series:Scientific Data
Online Access:https://doi.org/10.1038/s41597-025-04608-6
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850190402351529984
author Irene Finocchi
Alessio Martino
Fariba Ranjbar
Blerina Sinaimeri
author_facet Irene Finocchi
Alessio Martino
Fariba Ranjbar
Blerina Sinaimeri
author_sort Irene Finocchi
collection DOAJ
description Abstract We describe a bibliometric network characterizing co-authorship collaborations in the entire Italian academic community. The network, consisting of 38,220 nodes and 507,050 edges, is built upon two distinct data sources: faculty information provided by the Italian Ministry of University and Research and publications available in Semantic Scholar. Both nodes and edges are associated with a large variety of semantic data, including gender, bibliometric indexes, authors’ and publications’ research fields, and temporal information. While linking data between the two original sources posed many challenges, the network has been carefully validated to assess its reliability and to understand its graph-theoretic characteristics. By resembling several features of social networks, our dataset can be profitably leveraged in experimental studies in the wide social network analytics domain as well as in more specific bibliometric contexts.
format Article
id doaj-art-3a83ec2b0f8a4efca5e9b3271ab7b491
institution OA Journals
issn 2052-4463
language English
publishDate 2025-02-01
publisher Nature Portfolio
record_format Article
series Scientific Data
spelling doaj-art-3a83ec2b0f8a4efca5e9b3271ab7b4912025-08-20T02:15:17ZengNature PortfolioScientific Data2052-44632025-02-0112111610.1038/s41597-025-04608-6Data cleaning and enrichment through data integration: networking the Italian academiaIrene Finocchi0Alessio Martino1Fariba Ranjbar2Blerina Sinaimeri3Luiss Guido Carli, Department of Business and ManagementLuiss Guido Carli, Department of Business and ManagementLuiss Guido Carli, Department of Business and ManagementLuiss Guido Carli, Department of Business and ManagementAbstract We describe a bibliometric network characterizing co-authorship collaborations in the entire Italian academic community. The network, consisting of 38,220 nodes and 507,050 edges, is built upon two distinct data sources: faculty information provided by the Italian Ministry of University and Research and publications available in Semantic Scholar. Both nodes and edges are associated with a large variety of semantic data, including gender, bibliometric indexes, authors’ and publications’ research fields, and temporal information. While linking data between the two original sources posed many challenges, the network has been carefully validated to assess its reliability and to understand its graph-theoretic characteristics. By resembling several features of social networks, our dataset can be profitably leveraged in experimental studies in the wide social network analytics domain as well as in more specific bibliometric contexts.https://doi.org/10.1038/s41597-025-04608-6
spellingShingle Irene Finocchi
Alessio Martino
Fariba Ranjbar
Blerina Sinaimeri
Data cleaning and enrichment through data integration: networking the Italian academia
Scientific Data
title Data cleaning and enrichment through data integration: networking the Italian academia
title_full Data cleaning and enrichment through data integration: networking the Italian academia
title_fullStr Data cleaning and enrichment through data integration: networking the Italian academia
title_full_unstemmed Data cleaning and enrichment through data integration: networking the Italian academia
title_short Data cleaning and enrichment through data integration: networking the Italian academia
title_sort data cleaning and enrichment through data integration networking the italian academia
url https://doi.org/10.1038/s41597-025-04608-6
work_keys_str_mv AT irenefinocchi datacleaningandenrichmentthroughdataintegrationnetworkingtheitalianacademia
AT alessiomartino datacleaningandenrichmentthroughdataintegrationnetworkingtheitalianacademia
AT faribaranjbar datacleaningandenrichmentthroughdataintegrationnetworkingtheitalianacademia
AT blerinasinaimeri datacleaningandenrichmentthroughdataintegrationnetworkingtheitalianacademia