Regression-based predictive modelling of software size of fintech projects using technical specifications

This research aims to develop a predictive model to estimate the lines of code (LOC) of software projects using technical requirements specifications. It addresses the recurring issue of inaccurate effort and cost estimation in software development that often results in budget overruns and delays. T...

Full description

Saved in:
Bibliographic Details
Main Authors: Iqra Kanwal, Ali Afzal Malik
Format: Article
Language:English
Published: Mehran University of Engineering and Technology 2025-04-01
Series:Mehran University Research Journal of Engineering and Technology
Subjects:
Online Access:https://murjet.muet.edu.pk/index.php/home/article/view/298
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This research aims to develop a predictive model to estimate the lines of code (LOC) of software projects using technical requirements specifications. It addresses the recurring issue of inaccurate effort and cost estimation in software development that often results in budget overruns and delays. This study includes a detailed analysis of a dataset comprising past real-life software projects. It focuses on extracting relevant predictors from projects' requirements written in technical and easily comprehensible natural language. To assess feasibility, a pilot study is conducted at the beginning. Then, Simple Linear Regression (SLR) is employed to determine the relative predictive strength of eight potential predictors identified earlier. The number of API calls is found to be the strongest independent predictor (R2 = 0.670) of LOC. The subsequent phase entails constructing a software size prediction model using Forward Stepwise Multiple Linear Regression (FSMLR). The adjusted R2 value of the final model indicates that two factors – the number of API calls and the number of GUI fields – account for more than 80% of the variation in code size (measured using LOC). Model validation is performed using k-fold cross-validation. Validation results are also promising. The average MMRE of all folds is 0.203 indicating that, on average, the model's predictions are off by approximately 20% relative to the actual values. The average PRED (25) is 0.708 implying that nearly 71% of predicted size values are within 25% of the actual size values. This model can help project managers in making better decisions regarding project management, budgeting, and scheduling.
ISSN:0254-7821
2413-7219