Is Open Source the Future of AI? A Data-Driven Approach

Large language models (LLMs) have become central to both academic research and industrial applications, fueling debates on their accuracy, usability, privacy, and potential misuse. While proprietary models benefit from substantial investments in data and computing resources, open-sourcing is often s...

Full description

Saved in:

Bibliographic Details
Main Authors:	Domen Vake, Bogdan Šinik, Jernej Vičič, Aleksandar Tošić
Format:	Article
Language:	English
Published:	MDPI AG 2025-03-01
Series:	Applied Sciences
Subjects:	large language models artificial intelligence open source data science HuggingFace
Online Access:	https://www.mdpi.com/2076-3417/15/5/2790
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850226190149746688
author	Domen Vake Bogdan Šinik Jernej Vičič Aleksandar Tošić
author_facet	Domen Vake Bogdan Šinik Jernej Vičič Aleksandar Tošić
author_sort	Domen Vake
collection	DOAJ
description	Large language models (LLMs) have become central to both academic research and industrial applications, fueling debates on their accuracy, usability, privacy, and potential misuse. While proprietary models benefit from substantial investments in data and computing resources, open-sourcing is often suggested as a means to enhance trust and transparency. Yet, open-sourcing comes with its own challenges, such as risks of illicit applications, limited financial incentives, and intellectual property concerns. Positioned between these extremes are hybrid approaches—including partially open models and licensing restrictions—that aim to balance openness with control. In this paper, we adopt a data-driven approach to examine the open-source development of LLMs. By analyzing contributions in model improvements, modifications, and methodologies, we assess how community efforts impact model performance. Our findings indicate that the open-source community can significantly enhance models, demonstrating that community-driven modifications can yield efficiency gains without compromising performance. Moreover, our analysis reveals distinct trends in community growth and highlights which architectures benefit disproportionately from open-source engagement. These insights provide an empirical foundation to inform balanced discussions among industry experts and policymakers on the future direction of AI development.
format	Article
id	doaj-art-cfc3a352636b432db793eaa2708fc113
institution	OA Journals
issn	2076-3417
language	English
publishDate	2025-03-01
publisher	MDPI AG
record_format	Article
series	Applied Sciences
spelling	doaj-art-cfc3a352636b432db793eaa2708fc1132025-08-20T02:05:09ZengMDPI AGApplied Sciences2076-34172025-03-01155279010.3390/app15052790Is Open Source the Future of AI? A Data-Driven ApproachDomen Vake0Bogdan Šinik1Jernej Vičič2Aleksandar Tošić3UP FAMNIT, Glagoljaška 8, 6000 Koper, SloveniaUP FAMNIT, Glagoljaška 8, 6000 Koper, SloveniaUP FAMNIT, Glagoljaška 8, 6000 Koper, SloveniaUP FAMNIT, Glagoljaška 8, 6000 Koper, SloveniaLarge language models (LLMs) have become central to both academic research and industrial applications, fueling debates on their accuracy, usability, privacy, and potential misuse. While proprietary models benefit from substantial investments in data and computing resources, open-sourcing is often suggested as a means to enhance trust and transparency. Yet, open-sourcing comes with its own challenges, such as risks of illicit applications, limited financial incentives, and intellectual property concerns. Positioned between these extremes are hybrid approaches—including partially open models and licensing restrictions—that aim to balance openness with control. In this paper, we adopt a data-driven approach to examine the open-source development of LLMs. By analyzing contributions in model improvements, modifications, and methodologies, we assess how community efforts impact model performance. Our findings indicate that the open-source community can significantly enhance models, demonstrating that community-driven modifications can yield efficiency gains without compromising performance. Moreover, our analysis reveals distinct trends in community growth and highlights which architectures benefit disproportionately from open-source engagement. These insights provide an empirical foundation to inform balanced discussions among industry experts and policymakers on the future direction of AI development.https://www.mdpi.com/2076-3417/15/5/2790large language modelsartificial intelligenceopen sourcedata scienceHuggingFace
spellingShingle	Domen Vake Bogdan Šinik Jernej Vičič Aleksandar Tošić Is Open Source the Future of AI? A Data-Driven Approach Applied Sciences large language models artificial intelligence open source data science HuggingFace
title	Is Open Source the Future of AI? A Data-Driven Approach
title_full	Is Open Source the Future of AI? A Data-Driven Approach
title_fullStr	Is Open Source the Future of AI? A Data-Driven Approach
title_full_unstemmed	Is Open Source the Future of AI? A Data-Driven Approach
title_short	Is Open Source the Future of AI? A Data-Driven Approach
title_sort	is open source the future of ai a data driven approach
topic	large language models artificial intelligence open source data science HuggingFace
url	https://www.mdpi.com/2076-3417/15/5/2790
work_keys_str_mv	AT domenvake isopensourcethefutureofaiadatadrivenapproach AT bogdansinik isopensourcethefutureofaiadatadrivenapproach AT jernejvicic isopensourcethefutureofaiadatadrivenapproach AT aleksandartosic isopensourcethefutureofaiadatadrivenapproach

Is Open Source the Future of AI? A Data-Driven Approach

Similar Items