Embedded Hardware-Efficient FPGA Architecture for SVM Learning and Inference

Edge computing allows to do AI processing on devices with limited resources, but the challenge remains high computational costs followed by the energy limitations of such devices making on-device machine learning inefficient, especially for Support Vector Machine (SVM) classifiers. Although SVM clas...

Full description

Saved in:

Bibliographic Details
Main Authors:	B. B. Shabarinath, Muralidhar Pullakandam
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Access
Subjects:	Configurable architecture energy efficiency edge computing parallel SMO support vector machines SMO scheduler
Online Access:	https://ieeexplore.ieee.org/document/10969767/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849714449672306688
author	B. B. Shabarinath Muralidhar Pullakandam
author_facet	B. B. Shabarinath Muralidhar Pullakandam
author_sort	B. B. Shabarinath
collection	DOAJ
description	Edge computing allows to do AI processing on devices with limited resources, but the challenge remains high computational costs followed by the energy limitations of such devices making on-device machine learning inefficient, especially for Support Vector Machine (SVM) classifiers. Although SVM classifiers are generally very accurate, they require solving a quadratic optimization problem, making their implementation in real-time embedded devices challenging. While Sequential Minimal Optimization (SMO) has enhanced the efficiency of SVM training, traditional implementations still suffer from high computational cost. In this paper, we propose Parallel SMO, a new algorithm that selects multiple violating pairs in each iteration, allowing batch-wise updates that enhance convergence speed and optimize parallel computation. By buffering kernel values, it minimizes redundant computations, leading to improved memory efficiency and faster SVM training on FPGA architectures. In addition, we present a embedded hardware-efficient FPGA architecture for the integrated SVM learning based on Parallel SMO with SVM inference. It consists of SVM controller that schedules the operations of each clock cycle such that computations and memory access happen concurrently. The dynamic pipeline scheduling employ parameterized modules to schedule linear or nonlinear kernels and produce dimension-based reconfigurable blocks. A configuration signal turns on corresponding sub-blocks and clock-gating unused ones, thus enhancing resource utilization efficiency, energy efficiency, and overall performance. In several benchmarking data sets, the scheme reduces clock cycles per iteration consistently and improves throughput (up to 2427 iterations per second). It achieves up to 98% accuracy in classification with low power consumption, as reflected by training power of <inline-formula> <tex-math notation="LaTeX">$47 mW$ </tex-math></inline-formula> and high energy efficiency (up to <inline-formula> <tex-math notation="LaTeX">$51.64e+3$ </tex-math></inline-formula> iterations per joule). With the assistance of an adaptive kernel datapath, parallel error update execution, and best-pair selection, the scheme facilitates faster convergence, higher throughput, and on-chip inference with resource efficiency maintained.
format	Article
id	doaj-art-74c165940a974158a77490d35e148fd8
institution	DOAJ
issn	2169-3536
language	English
publishDate	2025-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj-art-74c165940a974158a77490d35e148fd82025-08-20T03:13:42ZengIEEEIEEE Access2169-35362025-01-0113689306894710.1109/ACCESS.2025.356245310969767Embedded Hardware-Efficient FPGA Architecture for SVM Learning and InferenceB. B. Shabarinath0https://orcid.org/0000-0001-6664-208XMuralidhar Pullakandam1https://orcid.org/0000-0002-3288-9989Department of Electronics and Communication Engineering, National Institute of Technology at Warangal, Warangal, Telangana, IndiaDepartment of Electronics and Communication Engineering, National Institute of Technology at Warangal, Warangal, Telangana, IndiaEdge computing allows to do AI processing on devices with limited resources, but the challenge remains high computational costs followed by the energy limitations of such devices making on-device machine learning inefficient, especially for Support Vector Machine (SVM) classifiers. Although SVM classifiers are generally very accurate, they require solving a quadratic optimization problem, making their implementation in real-time embedded devices challenging. While Sequential Minimal Optimization (SMO) has enhanced the efficiency of SVM training, traditional implementations still suffer from high computational cost. In this paper, we propose Parallel SMO, a new algorithm that selects multiple violating pairs in each iteration, allowing batch-wise updates that enhance convergence speed and optimize parallel computation. By buffering kernel values, it minimizes redundant computations, leading to improved memory efficiency and faster SVM training on FPGA architectures. In addition, we present a embedded hardware-efficient FPGA architecture for the integrated SVM learning based on Parallel SMO with SVM inference. It consists of SVM controller that schedules the operations of each clock cycle such that computations and memory access happen concurrently. The dynamic pipeline scheduling employ parameterized modules to schedule linear or nonlinear kernels and produce dimension-based reconfigurable blocks. A configuration signal turns on corresponding sub-blocks and clock-gating unused ones, thus enhancing resource utilization efficiency, energy efficiency, and overall performance. In several benchmarking data sets, the scheme reduces clock cycles per iteration consistently and improves throughput (up to 2427 iterations per second). It achieves up to 98% accuracy in classification with low power consumption, as reflected by training power of <inline-formula> <tex-math notation="LaTeX">$47 mW$ </tex-math></inline-formula> and high energy efficiency (up to <inline-formula> <tex-math notation="LaTeX">$51.64e+3$ </tex-math></inline-formula> iterations per joule). With the assistance of an adaptive kernel datapath, parallel error update execution, and best-pair selection, the scheme facilitates faster convergence, higher throughput, and on-chip inference with resource efficiency maintained.https://ieeexplore.ieee.org/document/10969767/Configurable architectureenergy efficiencyedge computingparallel SMOsupport vector machinesSMO scheduler
spellingShingle	B. B. Shabarinath Muralidhar Pullakandam Embedded Hardware-Efficient FPGA Architecture for SVM Learning and Inference IEEE Access Configurable architecture energy efficiency edge computing parallel SMO support vector machines SMO scheduler
title	Embedded Hardware-Efficient FPGA Architecture for SVM Learning and Inference
title_full	Embedded Hardware-Efficient FPGA Architecture for SVM Learning and Inference
title_fullStr	Embedded Hardware-Efficient FPGA Architecture for SVM Learning and Inference
title_full_unstemmed	Embedded Hardware-Efficient FPGA Architecture for SVM Learning and Inference
title_short	Embedded Hardware-Efficient FPGA Architecture for SVM Learning and Inference
title_sort	embedded hardware efficient fpga architecture for svm learning and inference
topic	Configurable architecture energy efficiency edge computing parallel SMO support vector machines SMO scheduler
url	https://ieeexplore.ieee.org/document/10969767/
work_keys_str_mv	AT bbshabarinath embeddedhardwareefficientfpgaarchitectureforsvmlearningandinference AT muralidharpullakandam embeddedhardwareefficientfpgaarchitectureforsvmlearningandinference

Embedded Hardware-Efficient FPGA Architecture for SVM Learning and Inference

Similar Items