Regression-based predictive modelling of software size of fintech projects using technical specifications

This research aims to develop a predictive model to estimate the lines of code (LOC) of software projects using technical requirements specifications. It addresses the recurring issue of inaccurate effort and cost estimation in software development that often results in budget overruns and delays. T...

Full description

Saved in:
Bibliographic Details
Main Authors: Iqra Kanwal, Ali Afzal Malik
Format: Article
Language:English
Published: Mehran University of Engineering and Technology 2025-04-01
Series:Mehran University Research Journal of Engineering and Technology
Subjects:
Online Access:https://murjet.muet.edu.pk/index.php/home/article/view/298
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849701198306738176
author Iqra Kanwal
Ali Afzal Malik
author_facet Iqra Kanwal
Ali Afzal Malik
author_sort Iqra Kanwal
collection DOAJ
description This research aims to develop a predictive model to estimate the lines of code (LOC) of software projects using technical requirements specifications. It addresses the recurring issue of inaccurate effort and cost estimation in software development that often results in budget overruns and delays. This study includes a detailed analysis of a dataset comprising past real-life software projects. It focuses on extracting relevant predictors from projects' requirements written in technical and easily comprehensible natural language. To assess feasibility, a pilot study is conducted at the beginning. Then, Simple Linear Regression (SLR) is employed to determine the relative predictive strength of eight potential predictors identified earlier. The number of API calls is found to be the strongest independent predictor (R2 = 0.670) of LOC. The subsequent phase entails constructing a software size prediction model using Forward Stepwise Multiple Linear Regression (FSMLR). The adjusted R2 value of the final model indicates that two factors – the number of API calls and the number of GUI fields – account for more than 80% of the variation in code size (measured using LOC). Model validation is performed using k-fold cross-validation. Validation results are also promising. The average MMRE of all folds is 0.203 indicating that, on average, the model's predictions are off by approximately 20% relative to the actual values. The average PRED (25) is 0.708 implying that nearly 71% of predicted size values are within 25% of the actual size values. This model can help project managers in making better decisions regarding project management, budgeting, and scheduling.
format Article
id doaj-art-6009d8abaef541c2b0a83240d83f34e7
institution DOAJ
issn 0254-7821
2413-7219
language English
publishDate 2025-04-01
publisher Mehran University of Engineering and Technology
record_format Article
series Mehran University Research Journal of Engineering and Technology
spelling doaj-art-6009d8abaef541c2b0a83240d83f34e72025-08-20T03:18:01ZengMehran University of Engineering and TechnologyMehran University Research Journal of Engineering and Technology0254-78212413-72192025-04-0144216417310.22581/muet1982.3289300Regression-based predictive modelling of software size of fintech projects using technical specificationsIqra Kanwal0Ali Afzal Malik1FAST School of Computing, National University of Computer and Emerging Sciences (NUCES), LahoreFAST School of Computing, National University of Computer and Emerging Sciences (NUCES), LahoreThis research aims to develop a predictive model to estimate the lines of code (LOC) of software projects using technical requirements specifications. It addresses the recurring issue of inaccurate effort and cost estimation in software development that often results in budget overruns and delays. This study includes a detailed analysis of a dataset comprising past real-life software projects. It focuses on extracting relevant predictors from projects' requirements written in technical and easily comprehensible natural language. To assess feasibility, a pilot study is conducted at the beginning. Then, Simple Linear Regression (SLR) is employed to determine the relative predictive strength of eight potential predictors identified earlier. The number of API calls is found to be the strongest independent predictor (R2 = 0.670) of LOC. The subsequent phase entails constructing a software size prediction model using Forward Stepwise Multiple Linear Regression (FSMLR). The adjusted R2 value of the final model indicates that two factors – the number of API calls and the number of GUI fields – account for more than 80% of the variation in code size (measured using LOC). Model validation is performed using k-fold cross-validation. Validation results are also promising. The average MMRE of all folds is 0.203 indicating that, on average, the model's predictions are off by approximately 20% relative to the actual values. The average PRED (25) is 0.708 implying that nearly 71% of predicted size values are within 25% of the actual size values. This model can help project managers in making better decisions regarding project management, budgeting, and scheduling.https://murjet.muet.edu.pk/index.php/home/article/view/298k-fold cross validationlines of code multiple linear regressionsize prediction modelsoftware size predictiontechnical specifications
spellingShingle Iqra Kanwal
Ali Afzal Malik
Regression-based predictive modelling of software size of fintech projects using technical specifications
Mehran University Research Journal of Engineering and Technology
k-fold cross validation
lines of code
multiple linear regression
size prediction model
software size prediction
technical specifications
title Regression-based predictive modelling of software size of fintech projects using technical specifications
title_full Regression-based predictive modelling of software size of fintech projects using technical specifications
title_fullStr Regression-based predictive modelling of software size of fintech projects using technical specifications
title_full_unstemmed Regression-based predictive modelling of software size of fintech projects using technical specifications
title_short Regression-based predictive modelling of software size of fintech projects using technical specifications
title_sort regression based predictive modelling of software size of fintech projects using technical specifications
topic k-fold cross validation
lines of code
multiple linear regression
size prediction model
software size prediction
technical specifications
url https://murjet.muet.edu.pk/index.php/home/article/view/298
work_keys_str_mv AT iqrakanwal regressionbasedpredictivemodellingofsoftwaresizeoffintechprojectsusingtechnicalspecifications
AT aliafzalmalik regressionbasedpredictivemodellingofsoftwaresizeoffintechprojectsusingtechnicalspecifications