Is Open Source the Future of AI? A Data-Driven Approach

Large language models (LLMs) have become central to both academic research and industrial applications, fueling debates on their accuracy, usability, privacy, and potential misuse. While proprietary models benefit from substantial investments in data and computing resources, open-sourcing is often s...

Full description

Saved in:
Bibliographic Details
Main Authors: Domen Vake, Bogdan Šinik, Jernej Vičič, Aleksandar Tošić
Format: Article
Language:English
Published: MDPI AG 2025-03-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/5/2790
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850226190149746688
author Domen Vake
Bogdan Šinik
Jernej Vičič
Aleksandar Tošić
author_facet Domen Vake
Bogdan Šinik
Jernej Vičič
Aleksandar Tošić
author_sort Domen Vake
collection DOAJ
description Large language models (LLMs) have become central to both academic research and industrial applications, fueling debates on their accuracy, usability, privacy, and potential misuse. While proprietary models benefit from substantial investments in data and computing resources, open-sourcing is often suggested as a means to enhance trust and transparency. Yet, open-sourcing comes with its own challenges, such as risks of illicit applications, limited financial incentives, and intellectual property concerns. Positioned between these extremes are hybrid approaches—including partially open models and licensing restrictions—that aim to balance openness with control. In this paper, we adopt a data-driven approach to examine the open-source development of LLMs. By analyzing contributions in model improvements, modifications, and methodologies, we assess how community efforts impact model performance. Our findings indicate that the open-source community can significantly enhance models, demonstrating that community-driven modifications can yield efficiency gains without compromising performance. Moreover, our analysis reveals distinct trends in community growth and highlights which architectures benefit disproportionately from open-source engagement. These insights provide an empirical foundation to inform balanced discussions among industry experts and policymakers on the future direction of AI development.
format Article
id doaj-art-cfc3a352636b432db793eaa2708fc113
institution OA Journals
issn 2076-3417
language English
publishDate 2025-03-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj-art-cfc3a352636b432db793eaa2708fc1132025-08-20T02:05:09ZengMDPI AGApplied Sciences2076-34172025-03-01155279010.3390/app15052790Is Open Source the Future of AI? A Data-Driven ApproachDomen Vake0Bogdan Šinik1Jernej Vičič2Aleksandar Tošić3UP FAMNIT, Glagoljaška 8, 6000 Koper, SloveniaUP FAMNIT, Glagoljaška 8, 6000 Koper, SloveniaUP FAMNIT, Glagoljaška 8, 6000 Koper, SloveniaUP FAMNIT, Glagoljaška 8, 6000 Koper, SloveniaLarge language models (LLMs) have become central to both academic research and industrial applications, fueling debates on their accuracy, usability, privacy, and potential misuse. While proprietary models benefit from substantial investments in data and computing resources, open-sourcing is often suggested as a means to enhance trust and transparency. Yet, open-sourcing comes with its own challenges, such as risks of illicit applications, limited financial incentives, and intellectual property concerns. Positioned between these extremes are hybrid approaches—including partially open models and licensing restrictions—that aim to balance openness with control. In this paper, we adopt a data-driven approach to examine the open-source development of LLMs. By analyzing contributions in model improvements, modifications, and methodologies, we assess how community efforts impact model performance. Our findings indicate that the open-source community can significantly enhance models, demonstrating that community-driven modifications can yield efficiency gains without compromising performance. Moreover, our analysis reveals distinct trends in community growth and highlights which architectures benefit disproportionately from open-source engagement. These insights provide an empirical foundation to inform balanced discussions among industry experts and policymakers on the future direction of AI development.https://www.mdpi.com/2076-3417/15/5/2790large language modelsartificial intelligenceopen sourcedata scienceHuggingFace
spellingShingle Domen Vake
Bogdan Šinik
Jernej Vičič
Aleksandar Tošić
Is Open Source the Future of AI? A Data-Driven Approach
Applied Sciences
large language models
artificial intelligence
open source
data science
HuggingFace
title Is Open Source the Future of AI? A Data-Driven Approach
title_full Is Open Source the Future of AI? A Data-Driven Approach
title_fullStr Is Open Source the Future of AI? A Data-Driven Approach
title_full_unstemmed Is Open Source the Future of AI? A Data-Driven Approach
title_short Is Open Source the Future of AI? A Data-Driven Approach
title_sort is open source the future of ai a data driven approach
topic large language models
artificial intelligence
open source
data science
HuggingFace
url https://www.mdpi.com/2076-3417/15/5/2790
work_keys_str_mv AT domenvake isopensourcethefutureofaiadatadrivenapproach
AT bogdansinik isopensourcethefutureofaiadatadrivenapproach
AT jernejvicic isopensourcethefutureofaiadatadrivenapproach
AT aleksandartosic isopensourcethefutureofaiadatadrivenapproach