Embedded Hardware-Efficient FPGA Architecture for SVM Learning and Inference

Edge computing allows to do AI processing on devices with limited resources, but the challenge remains high computational costs followed by the energy limitations of such devices making on-device machine learning inefficient, especially for Support Vector Machine (SVM) classifiers. Although SVM clas...

Full description

Saved in:
Bibliographic Details
Main Authors: B. B. Shabarinath, Muralidhar Pullakandam
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10969767/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849714449672306688
author B. B. Shabarinath
Muralidhar Pullakandam
author_facet B. B. Shabarinath
Muralidhar Pullakandam
author_sort B. B. Shabarinath
collection DOAJ
description Edge computing allows to do AI processing on devices with limited resources, but the challenge remains high computational costs followed by the energy limitations of such devices making on-device machine learning inefficient, especially for Support Vector Machine (SVM) classifiers. Although SVM classifiers are generally very accurate, they require solving a quadratic optimization problem, making their implementation in real-time embedded devices challenging. While Sequential Minimal Optimization (SMO) has enhanced the efficiency of SVM training, traditional implementations still suffer from high computational cost. In this paper, we propose Parallel SMO, a new algorithm that selects multiple violating pairs in each iteration, allowing batch-wise updates that enhance convergence speed and optimize parallel computation. By buffering kernel values, it minimizes redundant computations, leading to improved memory efficiency and faster SVM training on FPGA architectures. In addition, we present a embedded hardware-efficient FPGA architecture for the integrated SVM learning based on Parallel SMO with SVM inference. It consists of SVM controller that schedules the operations of each clock cycle such that computations and memory access happen concurrently. The dynamic pipeline scheduling employ parameterized modules to schedule linear or nonlinear kernels and produce dimension-based reconfigurable blocks. A configuration signal turns on corresponding sub-blocks and clock-gating unused ones, thus enhancing resource utilization efficiency, energy efficiency, and overall performance. In several benchmarking data sets, the scheme reduces clock cycles per iteration consistently and improves throughput (up to 2427 iterations per second). It achieves up to 98% accuracy in classification with low power consumption, as reflected by training power of <inline-formula> <tex-math notation="LaTeX">$47 mW$ </tex-math></inline-formula> and high energy efficiency (up to <inline-formula> <tex-math notation="LaTeX">$51.64e+3$ </tex-math></inline-formula> iterations per joule). With the assistance of an adaptive kernel datapath, parallel error update execution, and best-pair selection, the scheme facilitates faster convergence, higher throughput, and on-chip inference with resource efficiency maintained.
format Article
id doaj-art-74c165940a974158a77490d35e148fd8
institution DOAJ
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-74c165940a974158a77490d35e148fd82025-08-20T03:13:42ZengIEEEIEEE Access2169-35362025-01-0113689306894710.1109/ACCESS.2025.356245310969767Embedded Hardware-Efficient FPGA Architecture for SVM Learning and InferenceB. B. Shabarinath0https://orcid.org/0000-0001-6664-208XMuralidhar Pullakandam1https://orcid.org/0000-0002-3288-9989Department of Electronics and Communication Engineering, National Institute of Technology at Warangal, Warangal, Telangana, IndiaDepartment of Electronics and Communication Engineering, National Institute of Technology at Warangal, Warangal, Telangana, IndiaEdge computing allows to do AI processing on devices with limited resources, but the challenge remains high computational costs followed by the energy limitations of such devices making on-device machine learning inefficient, especially for Support Vector Machine (SVM) classifiers. Although SVM classifiers are generally very accurate, they require solving a quadratic optimization problem, making their implementation in real-time embedded devices challenging. While Sequential Minimal Optimization (SMO) has enhanced the efficiency of SVM training, traditional implementations still suffer from high computational cost. In this paper, we propose Parallel SMO, a new algorithm that selects multiple violating pairs in each iteration, allowing batch-wise updates that enhance convergence speed and optimize parallel computation. By buffering kernel values, it minimizes redundant computations, leading to improved memory efficiency and faster SVM training on FPGA architectures. In addition, we present a embedded hardware-efficient FPGA architecture for the integrated SVM learning based on Parallel SMO with SVM inference. It consists of SVM controller that schedules the operations of each clock cycle such that computations and memory access happen concurrently. The dynamic pipeline scheduling employ parameterized modules to schedule linear or nonlinear kernels and produce dimension-based reconfigurable blocks. A configuration signal turns on corresponding sub-blocks and clock-gating unused ones, thus enhancing resource utilization efficiency, energy efficiency, and overall performance. In several benchmarking data sets, the scheme reduces clock cycles per iteration consistently and improves throughput (up to 2427 iterations per second). It achieves up to 98% accuracy in classification with low power consumption, as reflected by training power of <inline-formula> <tex-math notation="LaTeX">$47 mW$ </tex-math></inline-formula> and high energy efficiency (up to <inline-formula> <tex-math notation="LaTeX">$51.64e+3$ </tex-math></inline-formula> iterations per joule). With the assistance of an adaptive kernel datapath, parallel error update execution, and best-pair selection, the scheme facilitates faster convergence, higher throughput, and on-chip inference with resource efficiency maintained.https://ieeexplore.ieee.org/document/10969767/Configurable architectureenergy efficiencyedge computingparallel SMOsupport vector machinesSMO scheduler
spellingShingle B. B. Shabarinath
Muralidhar Pullakandam
Embedded Hardware-Efficient FPGA Architecture for SVM Learning and Inference
IEEE Access
Configurable architecture
energy efficiency
edge computing
parallel SMO
support vector machines
SMO scheduler
title Embedded Hardware-Efficient FPGA Architecture for SVM Learning and Inference
title_full Embedded Hardware-Efficient FPGA Architecture for SVM Learning and Inference
title_fullStr Embedded Hardware-Efficient FPGA Architecture for SVM Learning and Inference
title_full_unstemmed Embedded Hardware-Efficient FPGA Architecture for SVM Learning and Inference
title_short Embedded Hardware-Efficient FPGA Architecture for SVM Learning and Inference
title_sort embedded hardware efficient fpga architecture for svm learning and inference
topic Configurable architecture
energy efficiency
edge computing
parallel SMO
support vector machines
SMO scheduler
url https://ieeexplore.ieee.org/document/10969767/
work_keys_str_mv AT bbshabarinath embeddedhardwareefficientfpgaarchitectureforsvmlearningandinference
AT muralidharpullakandam embeddedhardwareefficientfpgaarchitectureforsvmlearningandinference