Data cleaning and enrichment through data integration: networking the Italian academia
Abstract We describe a bibliometric network characterizing co-authorship collaborations in the entire Italian academic community. The network, consisting of 38,220 nodes and 507,050 edges, is built upon two distinct data sources: faculty information provided by the Italian Ministry of University and...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2025-02-01
|
| Series: | Scientific Data |
| Online Access: | https://doi.org/10.1038/s41597-025-04608-6 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850190402351529984 |
|---|---|
| author | Irene Finocchi Alessio Martino Fariba Ranjbar Blerina Sinaimeri |
| author_facet | Irene Finocchi Alessio Martino Fariba Ranjbar Blerina Sinaimeri |
| author_sort | Irene Finocchi |
| collection | DOAJ |
| description | Abstract We describe a bibliometric network characterizing co-authorship collaborations in the entire Italian academic community. The network, consisting of 38,220 nodes and 507,050 edges, is built upon two distinct data sources: faculty information provided by the Italian Ministry of University and Research and publications available in Semantic Scholar. Both nodes and edges are associated with a large variety of semantic data, including gender, bibliometric indexes, authors’ and publications’ research fields, and temporal information. While linking data between the two original sources posed many challenges, the network has been carefully validated to assess its reliability and to understand its graph-theoretic characteristics. By resembling several features of social networks, our dataset can be profitably leveraged in experimental studies in the wide social network analytics domain as well as in more specific bibliometric contexts. |
| format | Article |
| id | doaj-art-3a83ec2b0f8a4efca5e9b3271ab7b491 |
| institution | OA Journals |
| issn | 2052-4463 |
| language | English |
| publishDate | 2025-02-01 |
| publisher | Nature Portfolio |
| record_format | Article |
| series | Scientific Data |
| spelling | doaj-art-3a83ec2b0f8a4efca5e9b3271ab7b4912025-08-20T02:15:17ZengNature PortfolioScientific Data2052-44632025-02-0112111610.1038/s41597-025-04608-6Data cleaning and enrichment through data integration: networking the Italian academiaIrene Finocchi0Alessio Martino1Fariba Ranjbar2Blerina Sinaimeri3Luiss Guido Carli, Department of Business and ManagementLuiss Guido Carli, Department of Business and ManagementLuiss Guido Carli, Department of Business and ManagementLuiss Guido Carli, Department of Business and ManagementAbstract We describe a bibliometric network characterizing co-authorship collaborations in the entire Italian academic community. The network, consisting of 38,220 nodes and 507,050 edges, is built upon two distinct data sources: faculty information provided by the Italian Ministry of University and Research and publications available in Semantic Scholar. Both nodes and edges are associated with a large variety of semantic data, including gender, bibliometric indexes, authors’ and publications’ research fields, and temporal information. While linking data between the two original sources posed many challenges, the network has been carefully validated to assess its reliability and to understand its graph-theoretic characteristics. By resembling several features of social networks, our dataset can be profitably leveraged in experimental studies in the wide social network analytics domain as well as in more specific bibliometric contexts.https://doi.org/10.1038/s41597-025-04608-6 |
| spellingShingle | Irene Finocchi Alessio Martino Fariba Ranjbar Blerina Sinaimeri Data cleaning and enrichment through data integration: networking the Italian academia Scientific Data |
| title | Data cleaning and enrichment through data integration: networking the Italian academia |
| title_full | Data cleaning and enrichment through data integration: networking the Italian academia |
| title_fullStr | Data cleaning and enrichment through data integration: networking the Italian academia |
| title_full_unstemmed | Data cleaning and enrichment through data integration: networking the Italian academia |
| title_short | Data cleaning and enrichment through data integration: networking the Italian academia |
| title_sort | data cleaning and enrichment through data integration networking the italian academia |
| url | https://doi.org/10.1038/s41597-025-04608-6 |
| work_keys_str_mv | AT irenefinocchi datacleaningandenrichmentthroughdataintegrationnetworkingtheitalianacademia AT alessiomartino datacleaningandenrichmentthroughdataintegrationnetworkingtheitalianacademia AT faribaranjbar datacleaningandenrichmentthroughdataintegrationnetworkingtheitalianacademia AT blerinasinaimeri datacleaningandenrichmentthroughdataintegrationnetworkingtheitalianacademia |