Embedded Hardware-Efficient FPGA Architecture for SVM Learning and Inference
Edge computing allows to do AI processing on devices with limited resources, but the challenge remains high computational costs followed by the energy limitations of such devices making on-device machine learning inefficient, especially for Support Vector Machine (SVM) classifiers. Although SVM clas...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/10969767/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849714449672306688 |
|---|---|
| author | B. B. Shabarinath Muralidhar Pullakandam |
| author_facet | B. B. Shabarinath Muralidhar Pullakandam |
| author_sort | B. B. Shabarinath |
| collection | DOAJ |
| description | Edge computing allows to do AI processing on devices with limited resources, but the challenge remains high computational costs followed by the energy limitations of such devices making on-device machine learning inefficient, especially for Support Vector Machine (SVM) classifiers. Although SVM classifiers are generally very accurate, they require solving a quadratic optimization problem, making their implementation in real-time embedded devices challenging. While Sequential Minimal Optimization (SMO) has enhanced the efficiency of SVM training, traditional implementations still suffer from high computational cost. In this paper, we propose Parallel SMO, a new algorithm that selects multiple violating pairs in each iteration, allowing batch-wise updates that enhance convergence speed and optimize parallel computation. By buffering kernel values, it minimizes redundant computations, leading to improved memory efficiency and faster SVM training on FPGA architectures. In addition, we present a embedded hardware-efficient FPGA architecture for the integrated SVM learning based on Parallel SMO with SVM inference. It consists of SVM controller that schedules the operations of each clock cycle such that computations and memory access happen concurrently. The dynamic pipeline scheduling employ parameterized modules to schedule linear or nonlinear kernels and produce dimension-based reconfigurable blocks. A configuration signal turns on corresponding sub-blocks and clock-gating unused ones, thus enhancing resource utilization efficiency, energy efficiency, and overall performance. In several benchmarking data sets, the scheme reduces clock cycles per iteration consistently and improves throughput (up to 2427 iterations per second). It achieves up to 98% accuracy in classification with low power consumption, as reflected by training power of <inline-formula> <tex-math notation="LaTeX">$47 mW$ </tex-math></inline-formula> and high energy efficiency (up to <inline-formula> <tex-math notation="LaTeX">$51.64e+3$ </tex-math></inline-formula> iterations per joule). With the assistance of an adaptive kernel datapath, parallel error update execution, and best-pair selection, the scheme facilitates faster convergence, higher throughput, and on-chip inference with resource efficiency maintained. |
| format | Article |
| id | doaj-art-74c165940a974158a77490d35e148fd8 |
| institution | DOAJ |
| issn | 2169-3536 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-74c165940a974158a77490d35e148fd82025-08-20T03:13:42ZengIEEEIEEE Access2169-35362025-01-0113689306894710.1109/ACCESS.2025.356245310969767Embedded Hardware-Efficient FPGA Architecture for SVM Learning and InferenceB. B. Shabarinath0https://orcid.org/0000-0001-6664-208XMuralidhar Pullakandam1https://orcid.org/0000-0002-3288-9989Department of Electronics and Communication Engineering, National Institute of Technology at Warangal, Warangal, Telangana, IndiaDepartment of Electronics and Communication Engineering, National Institute of Technology at Warangal, Warangal, Telangana, IndiaEdge computing allows to do AI processing on devices with limited resources, but the challenge remains high computational costs followed by the energy limitations of such devices making on-device machine learning inefficient, especially for Support Vector Machine (SVM) classifiers. Although SVM classifiers are generally very accurate, they require solving a quadratic optimization problem, making their implementation in real-time embedded devices challenging. While Sequential Minimal Optimization (SMO) has enhanced the efficiency of SVM training, traditional implementations still suffer from high computational cost. In this paper, we propose Parallel SMO, a new algorithm that selects multiple violating pairs in each iteration, allowing batch-wise updates that enhance convergence speed and optimize parallel computation. By buffering kernel values, it minimizes redundant computations, leading to improved memory efficiency and faster SVM training on FPGA architectures. In addition, we present a embedded hardware-efficient FPGA architecture for the integrated SVM learning based on Parallel SMO with SVM inference. It consists of SVM controller that schedules the operations of each clock cycle such that computations and memory access happen concurrently. The dynamic pipeline scheduling employ parameterized modules to schedule linear or nonlinear kernels and produce dimension-based reconfigurable blocks. A configuration signal turns on corresponding sub-blocks and clock-gating unused ones, thus enhancing resource utilization efficiency, energy efficiency, and overall performance. In several benchmarking data sets, the scheme reduces clock cycles per iteration consistently and improves throughput (up to 2427 iterations per second). It achieves up to 98% accuracy in classification with low power consumption, as reflected by training power of <inline-formula> <tex-math notation="LaTeX">$47 mW$ </tex-math></inline-formula> and high energy efficiency (up to <inline-formula> <tex-math notation="LaTeX">$51.64e+3$ </tex-math></inline-formula> iterations per joule). With the assistance of an adaptive kernel datapath, parallel error update execution, and best-pair selection, the scheme facilitates faster convergence, higher throughput, and on-chip inference with resource efficiency maintained.https://ieeexplore.ieee.org/document/10969767/Configurable architectureenergy efficiencyedge computingparallel SMOsupport vector machinesSMO scheduler |
| spellingShingle | B. B. Shabarinath Muralidhar Pullakandam Embedded Hardware-Efficient FPGA Architecture for SVM Learning and Inference IEEE Access Configurable architecture energy efficiency edge computing parallel SMO support vector machines SMO scheduler |
| title | Embedded Hardware-Efficient FPGA Architecture for SVM Learning and Inference |
| title_full | Embedded Hardware-Efficient FPGA Architecture for SVM Learning and Inference |
| title_fullStr | Embedded Hardware-Efficient FPGA Architecture for SVM Learning and Inference |
| title_full_unstemmed | Embedded Hardware-Efficient FPGA Architecture for SVM Learning and Inference |
| title_short | Embedded Hardware-Efficient FPGA Architecture for SVM Learning and Inference |
| title_sort | embedded hardware efficient fpga architecture for svm learning and inference |
| topic | Configurable architecture energy efficiency edge computing parallel SMO support vector machines SMO scheduler |
| url | https://ieeexplore.ieee.org/document/10969767/ |
| work_keys_str_mv | AT bbshabarinath embeddedhardwareefficientfpgaarchitectureforsvmlearningandinference AT muralidharpullakandam embeddedhardwareefficientfpgaarchitectureforsvmlearningandinference |