Is Open Source the Future of AI? A Data-Driven Approach
Large language models (LLMs) have become central to both academic research and industrial applications, fueling debates on their accuracy, usability, privacy, and potential misuse. While proprietary models benefit from substantial investments in data and computing resources, open-sourcing is often s...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-03-01
|
| Series: | Applied Sciences |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2076-3417/15/5/2790 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850226190149746688 |
|---|---|
| author | Domen Vake Bogdan Šinik Jernej Vičič Aleksandar Tošić |
| author_facet | Domen Vake Bogdan Šinik Jernej Vičič Aleksandar Tošić |
| author_sort | Domen Vake |
| collection | DOAJ |
| description | Large language models (LLMs) have become central to both academic research and industrial applications, fueling debates on their accuracy, usability, privacy, and potential misuse. While proprietary models benefit from substantial investments in data and computing resources, open-sourcing is often suggested as a means to enhance trust and transparency. Yet, open-sourcing comes with its own challenges, such as risks of illicit applications, limited financial incentives, and intellectual property concerns. Positioned between these extremes are hybrid approaches—including partially open models and licensing restrictions—that aim to balance openness with control. In this paper, we adopt a data-driven approach to examine the open-source development of LLMs. By analyzing contributions in model improvements, modifications, and methodologies, we assess how community efforts impact model performance. Our findings indicate that the open-source community can significantly enhance models, demonstrating that community-driven modifications can yield efficiency gains without compromising performance. Moreover, our analysis reveals distinct trends in community growth and highlights which architectures benefit disproportionately from open-source engagement. These insights provide an empirical foundation to inform balanced discussions among industry experts and policymakers on the future direction of AI development. |
| format | Article |
| id | doaj-art-cfc3a352636b432db793eaa2708fc113 |
| institution | OA Journals |
| issn | 2076-3417 |
| language | English |
| publishDate | 2025-03-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Applied Sciences |
| spelling | doaj-art-cfc3a352636b432db793eaa2708fc1132025-08-20T02:05:09ZengMDPI AGApplied Sciences2076-34172025-03-01155279010.3390/app15052790Is Open Source the Future of AI? A Data-Driven ApproachDomen Vake0Bogdan Šinik1Jernej Vičič2Aleksandar Tošić3UP FAMNIT, Glagoljaška 8, 6000 Koper, SloveniaUP FAMNIT, Glagoljaška 8, 6000 Koper, SloveniaUP FAMNIT, Glagoljaška 8, 6000 Koper, SloveniaUP FAMNIT, Glagoljaška 8, 6000 Koper, SloveniaLarge language models (LLMs) have become central to both academic research and industrial applications, fueling debates on their accuracy, usability, privacy, and potential misuse. While proprietary models benefit from substantial investments in data and computing resources, open-sourcing is often suggested as a means to enhance trust and transparency. Yet, open-sourcing comes with its own challenges, such as risks of illicit applications, limited financial incentives, and intellectual property concerns. Positioned between these extremes are hybrid approaches—including partially open models and licensing restrictions—that aim to balance openness with control. In this paper, we adopt a data-driven approach to examine the open-source development of LLMs. By analyzing contributions in model improvements, modifications, and methodologies, we assess how community efforts impact model performance. Our findings indicate that the open-source community can significantly enhance models, demonstrating that community-driven modifications can yield efficiency gains without compromising performance. Moreover, our analysis reveals distinct trends in community growth and highlights which architectures benefit disproportionately from open-source engagement. These insights provide an empirical foundation to inform balanced discussions among industry experts and policymakers on the future direction of AI development.https://www.mdpi.com/2076-3417/15/5/2790large language modelsartificial intelligenceopen sourcedata scienceHuggingFace |
| spellingShingle | Domen Vake Bogdan Šinik Jernej Vičič Aleksandar Tošić Is Open Source the Future of AI? A Data-Driven Approach Applied Sciences large language models artificial intelligence open source data science HuggingFace |
| title | Is Open Source the Future of AI? A Data-Driven Approach |
| title_full | Is Open Source the Future of AI? A Data-Driven Approach |
| title_fullStr | Is Open Source the Future of AI? A Data-Driven Approach |
| title_full_unstemmed | Is Open Source the Future of AI? A Data-Driven Approach |
| title_short | Is Open Source the Future of AI? A Data-Driven Approach |
| title_sort | is open source the future of ai a data driven approach |
| topic | large language models artificial intelligence open source data science HuggingFace |
| url | https://www.mdpi.com/2076-3417/15/5/2790 |
| work_keys_str_mv | AT domenvake isopensourcethefutureofaiadatadrivenapproach AT bogdansinik isopensourcethefutureofaiadatadrivenapproach AT jernejvicic isopensourcethefutureofaiadatadrivenapproach AT aleksandartosic isopensourcethefutureofaiadatadrivenapproach |