OPTIMIZATIONS OF DEEP LEARNING OBJECTS DETECTION MODELS FOR INFERENCE ACCELERATION ON GENERAL-PURPOSE AND HARDWARE-ACCELERATED SINGLE-BOARD PLATFORMS

Background. Description and preparation of modern approaches for deep learning object detection models are provided. Deep learning frameworks for model training and inference, such as TensorFlow and TensorFlow Lite, are used as bases. The concepts of deep learning model optimization are analyzed....

Full description

Saved in:
Bibliographic Details
Main Authors: Dmytro Myroniuk, Bohdan Blahitko
Format: Article
Language:English
Published: Ivan Franko National University of Lviv 2025-03-01
Series:Електроніка та інформаційні технології
Subjects:
Online Access:http://publications.lnu.edu.ua/collections/index.php/electronics/article/view/4782
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849432351419924480
author Dmytro Myroniuk
Bohdan Blahitko
author_facet Dmytro Myroniuk
Bohdan Blahitko
author_sort Dmytro Myroniuk
collection DOAJ
description Background. Description and preparation of modern approaches for deep learning object detection models are provided. Deep learning frameworks for model training and inference, such as TensorFlow and TensorFlow Lite, are used as bases. The concepts of deep learning model optimization are analyzed. Materials and Methods. The quantized int8 models are used as a baseline for optimization effectiveness estimation. The delegation approach includes software or hardware-optimized variants of neural operations. It prepared to speed up the inference process on target devices. The device with reduced performance resources or microcontroller without floating-point blocks uses a case of base-optimization model with int8 weights. The TensorFlow Lite framework has various quantization types outlined in a detailed explanation. Benchmarks for modern single-board devices are ready, and the correlation between using different optimization approaches, types of single-board platforms, and model inference speed analyses. Results and Discussion. All tested models are pretrained using the MS COCO dataset (80 classes). All models were prepared for the experiment with 8-bit full integer quantization and output-TFLite model generation using TensorFlow Object Detection API Docker images and Python 3.11. The testing data samples are obtained from the MS COCO validation dataset archive. The size of the image input is 640x640 RGB. The comparison of image recognition time to 640x640 RGB was conducted on Raspberry Pi 5, Raspberry Pi 4, and Jetson Nano 2GB. Only the Raspberry Pi 5 target device achieved real-time execution (100 ms at most or one fps) as it has more CPU performance than other devices. Conclusion. Confirmation of the real-time execution approach was achieved by using reference models with reduced image sizes (320x320 RGB). TensorFlow standard model Zoo models, compiled with the TensorRT compiler, were used for the Jetson Nano target as an NPU-optimized case. Real-time execution (100 ms at most or one fps) is reaching for most models and target devices. Such an approach is suitable for less powerful devices with ARM Cortex-A processors.
format Article
id doaj-art-9eefe1cf3b0f4e83916a2a3cac4b3c0d
institution Kabale University
issn 2224-087X
2224-0888
language English
publishDate 2025-03-01
publisher Ivan Franko National University of Lviv
record_format Article
series Електроніка та інформаційні технології
spelling doaj-art-9eefe1cf3b0f4e83916a2a3cac4b3c0d2025-08-20T03:27:22ZengIvan Franko National University of LvivЕлектроніка та інформаційні технології2224-087X2224-08882025-03-0129576810.30970/eli.29.6OPTIMIZATIONS OF DEEP LEARNING OBJECTS DETECTION MODELS FOR INFERENCE ACCELERATION ON GENERAL-PURPOSE AND HARDWARE-ACCELERATED SINGLE-BOARD PLATFORMSDmytro Myroniuk0https://orcid.org/0009-0001-5634-9354Bohdan Blahitko1https://orcid.org/0000-0002-0516-9353Ivan Franko National University of LvivIvan Franko National University of LvivBackground. Description and preparation of modern approaches for deep learning object detection models are provided. Deep learning frameworks for model training and inference, such as TensorFlow and TensorFlow Lite, are used as bases. The concepts of deep learning model optimization are analyzed. Materials and Methods. The quantized int8 models are used as a baseline for optimization effectiveness estimation. The delegation approach includes software or hardware-optimized variants of neural operations. It prepared to speed up the inference process on target devices. The device with reduced performance resources or microcontroller without floating-point blocks uses a case of base-optimization model with int8 weights. The TensorFlow Lite framework has various quantization types outlined in a detailed explanation. Benchmarks for modern single-board devices are ready, and the correlation between using different optimization approaches, types of single-board platforms, and model inference speed analyses. Results and Discussion. All tested models are pretrained using the MS COCO dataset (80 classes). All models were prepared for the experiment with 8-bit full integer quantization and output-TFLite model generation using TensorFlow Object Detection API Docker images and Python 3.11. The testing data samples are obtained from the MS COCO validation dataset archive. The size of the image input is 640x640 RGB. The comparison of image recognition time to 640x640 RGB was conducted on Raspberry Pi 5, Raspberry Pi 4, and Jetson Nano 2GB. Only the Raspberry Pi 5 target device achieved real-time execution (100 ms at most or one fps) as it has more CPU performance than other devices. Conclusion. Confirmation of the real-time execution approach was achieved by using reference models with reduced image sizes (320x320 RGB). TensorFlow standard model Zoo models, compiled with the TensorRT compiler, were used for the Jetson Nano target as an NPU-optimized case. Real-time execution (100 ms at most or one fps) is reaching for most models and target devices. Such an approach is suitable for less powerful devices with ARM Cortex-A processors.http://publications.lnu.edu.ua/collections/index.php/electronics/article/view/4782single-board computersmodelingbenchmarkingneural networksobject detectionoptimization
spellingShingle Dmytro Myroniuk
Bohdan Blahitko
OPTIMIZATIONS OF DEEP LEARNING OBJECTS DETECTION MODELS FOR INFERENCE ACCELERATION ON GENERAL-PURPOSE AND HARDWARE-ACCELERATED SINGLE-BOARD PLATFORMS
Електроніка та інформаційні технології
single-board computers
modeling
benchmarking
neural networks
object detection
optimization
title OPTIMIZATIONS OF DEEP LEARNING OBJECTS DETECTION MODELS FOR INFERENCE ACCELERATION ON GENERAL-PURPOSE AND HARDWARE-ACCELERATED SINGLE-BOARD PLATFORMS
title_full OPTIMIZATIONS OF DEEP LEARNING OBJECTS DETECTION MODELS FOR INFERENCE ACCELERATION ON GENERAL-PURPOSE AND HARDWARE-ACCELERATED SINGLE-BOARD PLATFORMS
title_fullStr OPTIMIZATIONS OF DEEP LEARNING OBJECTS DETECTION MODELS FOR INFERENCE ACCELERATION ON GENERAL-PURPOSE AND HARDWARE-ACCELERATED SINGLE-BOARD PLATFORMS
title_full_unstemmed OPTIMIZATIONS OF DEEP LEARNING OBJECTS DETECTION MODELS FOR INFERENCE ACCELERATION ON GENERAL-PURPOSE AND HARDWARE-ACCELERATED SINGLE-BOARD PLATFORMS
title_short OPTIMIZATIONS OF DEEP LEARNING OBJECTS DETECTION MODELS FOR INFERENCE ACCELERATION ON GENERAL-PURPOSE AND HARDWARE-ACCELERATED SINGLE-BOARD PLATFORMS
title_sort optimizations of deep learning objects detection models for inference acceleration on general purpose and hardware accelerated single board platforms
topic single-board computers
modeling
benchmarking
neural networks
object detection
optimization
url http://publications.lnu.edu.ua/collections/index.php/electronics/article/view/4782
work_keys_str_mv AT dmytromyroniuk optimizationsofdeeplearningobjectsdetectionmodelsforinferenceaccelerationongeneralpurposeandhardwareacceleratedsingleboardplatforms
AT bohdanblahitko optimizationsofdeeplearningobjectsdetectionmodelsforinferenceaccelerationongeneralpurposeandhardwareacceleratedsingleboardplatforms