An in‐depth study of the effects of methods on the dataset selection of public development projects
Abstract Public development projects (PDPs) and documented public development projects (DPDPs) are two types of projects that can provide valuable information on how developers and users participate in OSS projects. However, it is hard for researchers to effectively select PDPs and DPDPs due to the...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Wiley
2022-04-01
|
Series: | IET Software |
Subjects: | |
Online Access: | https://doi.org/10.1049/sfw2.12050 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832559629768327168 |
---|---|
author | Can Cheng Bing Li Zengyang Li Peng Liang Xu Yang |
author_facet | Can Cheng Bing Li Zengyang Li Peng Liang Xu Yang |
author_sort | Can Cheng |
collection | DOAJ |
description | Abstract Public development projects (PDPs) and documented public development projects (DPDPs) are two types of projects that can provide valuable information on how developers and users participate in OSS projects. However, it is hard for researchers to effectively select PDPs and DPDPs due to the lack of specific project selection methods for these two types of projects. To address this problem, a standard dataset was labelled and the base line methods (i.e. selecting projects according to a single feature like star number) under 60 configurations and the machine learning methods under 18 configurations were tested to identify the best configurations in precision and F‐measure for selecting PDPs and DPDPs. The results show that (1) to select PDPs or DPDPs with a high precision, the base line method is the best with precision of 0.877 (PDPs) and 0.831 (DPDPs); (2) to select PDPs or DPDPs with a high F‐measure, the machine learning methods are the best, with F‐measure of 0.817 (PDPs) and 0.789 (DPDPs); (3) existing sample selection strategies can be combined with the machine learning methods, and the precision of selecting PDPs can be increased by 6.39%–41.33% and the precision of selecting DPDPs can be can be increased by 35.50%–269.02%. |
format | Article |
id | doaj-art-6fe40b02bee84944a4c421ab4e5c2cf5 |
institution | Kabale University |
issn | 1751-8806 1751-8814 |
language | English |
publishDate | 2022-04-01 |
publisher | Wiley |
record_format | Article |
series | IET Software |
spelling | doaj-art-6fe40b02bee84944a4c421ab4e5c2cf52025-02-03T01:29:38ZengWileyIET Software1751-88061751-88142022-04-0116214616610.1049/sfw2.12050An in‐depth study of the effects of methods on the dataset selection of public development projectsCan Cheng0Bing Li1Zengyang Li2Peng Liang3Xu Yang4School of Computer Science Wuhan University Wuhan ChinaSchool of Computer Science Wuhan University Wuhan ChinaSchool of Computer Science Central China Normal University Wuhan ChinaSchool of Computer Science Wuhan University Wuhan ChinaHuawei Technologies Nanjing ChinaAbstract Public development projects (PDPs) and documented public development projects (DPDPs) are two types of projects that can provide valuable information on how developers and users participate in OSS projects. However, it is hard for researchers to effectively select PDPs and DPDPs due to the lack of specific project selection methods for these two types of projects. To address this problem, a standard dataset was labelled and the base line methods (i.e. selecting projects according to a single feature like star number) under 60 configurations and the machine learning methods under 18 configurations were tested to identify the best configurations in precision and F‐measure for selecting PDPs and DPDPs. The results show that (1) to select PDPs or DPDPs with a high precision, the base line method is the best with precision of 0.877 (PDPs) and 0.831 (DPDPs); (2) to select PDPs or DPDPs with a high F‐measure, the machine learning methods are the best, with F‐measure of 0.817 (PDPs) and 0.789 (DPDPs); (3) existing sample selection strategies can be combined with the machine learning methods, and the precision of selecting PDPs can be increased by 6.39%–41.33% and the precision of selecting DPDPs can be can be increased by 35.50%–269.02%.https://doi.org/10.1049/sfw2.12050data miningsoftware engineering |
spellingShingle | Can Cheng Bing Li Zengyang Li Peng Liang Xu Yang An in‐depth study of the effects of methods on the dataset selection of public development projects IET Software data mining software engineering |
title | An in‐depth study of the effects of methods on the dataset selection of public development projects |
title_full | An in‐depth study of the effects of methods on the dataset selection of public development projects |
title_fullStr | An in‐depth study of the effects of methods on the dataset selection of public development projects |
title_full_unstemmed | An in‐depth study of the effects of methods on the dataset selection of public development projects |
title_short | An in‐depth study of the effects of methods on the dataset selection of public development projects |
title_sort | in depth study of the effects of methods on the dataset selection of public development projects |
topic | data mining software engineering |
url | https://doi.org/10.1049/sfw2.12050 |
work_keys_str_mv | AT cancheng anindepthstudyoftheeffectsofmethodsonthedatasetselectionofpublicdevelopmentprojects AT bingli anindepthstudyoftheeffectsofmethodsonthedatasetselectionofpublicdevelopmentprojects AT zengyangli anindepthstudyoftheeffectsofmethodsonthedatasetselectionofpublicdevelopmentprojects AT pengliang anindepthstudyoftheeffectsofmethodsonthedatasetselectionofpublicdevelopmentprojects AT xuyang anindepthstudyoftheeffectsofmethodsonthedatasetselectionofpublicdevelopmentprojects AT cancheng indepthstudyoftheeffectsofmethodsonthedatasetselectionofpublicdevelopmentprojects AT bingli indepthstudyoftheeffectsofmethodsonthedatasetselectionofpublicdevelopmentprojects AT zengyangli indepthstudyoftheeffectsofmethodsonthedatasetselectionofpublicdevelopmentprojects AT pengliang indepthstudyoftheeffectsofmethodsonthedatasetselectionofpublicdevelopmentprojects AT xuyang indepthstudyoftheeffectsofmethodsonthedatasetselectionofpublicdevelopmentprojects |