Optimizing Machine Learning Models with Data-level Approximate Computing: The Role of Diverse Sampling, Precision Scaling, Quantization and Feature Selection Strategies

Efficiency, low-power consumption, and real-time processing in embedded machine learning implementations are critical, particularly for models deployed in environments with large-scale data processing and resource-constrained environments. This paper investigates the application of approximate compu...

Full description

Saved in:
Bibliographic Details
Main Authors: Ayad M. Dalloo, Amjad J. Humaidi
Format: Article
Language:English
Published: Elsevier 2024-12-01
Series:Results in Engineering
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2590123024017031
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850249405212393472
author Ayad M. Dalloo
Amjad J. Humaidi
author_facet Ayad M. Dalloo
Amjad J. Humaidi
author_sort Ayad M. Dalloo
collection DOAJ
description Efficiency, low-power consumption, and real-time processing in embedded machine learning implementations are critical, particularly for models deployed in environments with large-scale data processing and resource-constrained environments. This paper investigates the application of approximate computing techniques as a viable solution to reduce computational complexity and optimize machine learning models, focusing on two widely used supervised machine learning models: k-nearest neighbors (KNN) and support vector machines (SVM). Although many studies compare machine learning classification techniques, the combined use of optimization strategies remains underexplored. Specifically, the combined utilization of feature selection, sampling, quantization, precision scaling, and relaxation methods for the purpose of optimizing and acquiring training and validation data is underexplored, particularly within the context of medical diagnosis datasets. In this paper, we propose a framework that uses data-level approximate computing techniques, including by diverse sampling strategies, precision scaling, quantization, and feature selection methods, to evaluate the impact of these techniques on the computational efficiency and accuracy of KNN and SVM models. Experimental results demonstrate that with careful application of approximate computing strategies, especially in critical applications such as medical diagnosis, it is possible to achieve considerable gains in efficiency while maintaining acceptable levels of accuracy. The combined application of these methods by selecting 3 features and quantizing the data values to 8 levels, then applying random sampling with 30% reductions and scaling the precision at 5 bits resulted in reductions of 87.5% in computation, 76.9% in memory usage, and 17% in delay, without any degradation in accuracy, as validated by tenfold cross-validation, Training Data validation, and full dataset validation. This study confirms the potential of approximate computing to optimize machine learning workflows, making it particularly suitable for applications with limited computational resources. The source code is publicly available online https://github.com/AyadMDalloo/DatalvlAxC.
format Article
id doaj-art-b67ca900544e4ec3b0db0a12e90e201d
institution OA Journals
issn 2590-1230
language English
publishDate 2024-12-01
publisher Elsevier
record_format Article
series Results in Engineering
spelling doaj-art-b67ca900544e4ec3b0db0a12e90e201d2025-08-20T01:58:30ZengElsevierResults in Engineering2590-12302024-12-012410345110.1016/j.rineng.2024.103451Optimizing Machine Learning Models with Data-level Approximate Computing: The Role of Diverse Sampling, Precision Scaling, Quantization and Feature Selection StrategiesAyad M. Dalloo0Amjad J. Humaidi1Department of Communication Engineering, University of Technology, Baghdad, Iraq; Corresponding author.Control and Systems Engineering Department, University of Technology, Baghdad, IraqEfficiency, low-power consumption, and real-time processing in embedded machine learning implementations are critical, particularly for models deployed in environments with large-scale data processing and resource-constrained environments. This paper investigates the application of approximate computing techniques as a viable solution to reduce computational complexity and optimize machine learning models, focusing on two widely used supervised machine learning models: k-nearest neighbors (KNN) and support vector machines (SVM). Although many studies compare machine learning classification techniques, the combined use of optimization strategies remains underexplored. Specifically, the combined utilization of feature selection, sampling, quantization, precision scaling, and relaxation methods for the purpose of optimizing and acquiring training and validation data is underexplored, particularly within the context of medical diagnosis datasets. In this paper, we propose a framework that uses data-level approximate computing techniques, including by diverse sampling strategies, precision scaling, quantization, and feature selection methods, to evaluate the impact of these techniques on the computational efficiency and accuracy of KNN and SVM models. Experimental results demonstrate that with careful application of approximate computing strategies, especially in critical applications such as medical diagnosis, it is possible to achieve considerable gains in efficiency while maintaining acceptable levels of accuracy. The combined application of these methods by selecting 3 features and quantizing the data values to 8 levels, then applying random sampling with 30% reductions and scaling the precision at 5 bits resulted in reductions of 87.5% in computation, 76.9% in memory usage, and 17% in delay, without any degradation in accuracy, as validated by tenfold cross-validation, Training Data validation, and full dataset validation. This study confirms the potential of approximate computing to optimize machine learning workflows, making it particularly suitable for applications with limited computational resources. The source code is publicly available online https://github.com/AyadMDalloo/DatalvlAxC.http://www.sciencedirect.com/science/article/pii/S2590123024017031Machine LearningApproximate ComputingSampling, quantization and precision scalingCritical Applications
spellingShingle Ayad M. Dalloo
Amjad J. Humaidi
Optimizing Machine Learning Models with Data-level Approximate Computing: The Role of Diverse Sampling, Precision Scaling, Quantization and Feature Selection Strategies
Results in Engineering
Machine Learning
Approximate Computing
Sampling, quantization and precision scaling
Critical Applications
title Optimizing Machine Learning Models with Data-level Approximate Computing: The Role of Diverse Sampling, Precision Scaling, Quantization and Feature Selection Strategies
title_full Optimizing Machine Learning Models with Data-level Approximate Computing: The Role of Diverse Sampling, Precision Scaling, Quantization and Feature Selection Strategies
title_fullStr Optimizing Machine Learning Models with Data-level Approximate Computing: The Role of Diverse Sampling, Precision Scaling, Quantization and Feature Selection Strategies
title_full_unstemmed Optimizing Machine Learning Models with Data-level Approximate Computing: The Role of Diverse Sampling, Precision Scaling, Quantization and Feature Selection Strategies
title_short Optimizing Machine Learning Models with Data-level Approximate Computing: The Role of Diverse Sampling, Precision Scaling, Quantization and Feature Selection Strategies
title_sort optimizing machine learning models with data level approximate computing the role of diverse sampling precision scaling quantization and feature selection strategies
topic Machine Learning
Approximate Computing
Sampling, quantization and precision scaling
Critical Applications
url http://www.sciencedirect.com/science/article/pii/S2590123024017031
work_keys_str_mv AT ayadmdalloo optimizingmachinelearningmodelswithdatalevelapproximatecomputingtheroleofdiversesamplingprecisionscalingquantizationandfeatureselectionstrategies
AT amjadjhumaidi optimizingmachinelearningmodelswithdatalevelapproximatecomputingtheroleofdiversesamplingprecisionscalingquantizationandfeatureselectionstrategies