OPTIMIZATIONS OF DEEP LEARNING OBJECTS DETECTION MODELS FOR INFERENCE ACCELERATION ON GENERAL-PURPOSE AND HARDWARE-ACCELERATED SINGLE-BOARD PLATFORMS

Background. Description and preparation of modern approaches for deep learning object detection models are provided. Deep learning frameworks for model training and inference, such as TensorFlow and TensorFlow Lite, are used as bases. The concepts of deep learning model optimization are analyzed....

Full description

Saved in:

Bibliographic Details
Main Authors:	Dmytro Myroniuk, Bohdan Blahitko
Format:	Article
Language:	English
Published:	Ivan Franko National University of Lviv 2025-03-01
Series:	Електроніка та інформаційні технології
Subjects:	single-board computers modeling benchmarking neural networks object detection optimization
Online Access:	http://publications.lnu.edu.ua/collections/index.php/electronics/article/view/4782
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849432351419924480
author	Dmytro Myroniuk Bohdan Blahitko
author_facet	Dmytro Myroniuk Bohdan Blahitko
author_sort	Dmytro Myroniuk
collection	DOAJ
description	Background. Description and preparation of modern approaches for deep learning object detection models are provided. Deep learning frameworks for model training and inference, such as TensorFlow and TensorFlow Lite, are used as bases. The concepts of deep learning model optimization are analyzed. Materials and Methods. The quantized int8 models are used as a baseline for optimization effectiveness estimation. The delegation approach includes software or hardware-optimized variants of neural operations. It prepared to speed up the inference process on target devices. The device with reduced performance resources or microcontroller without floating-point blocks uses a case of base-optimization model with int8 weights. The TensorFlow Lite framework has various quantization types outlined in a detailed explanation. Benchmarks for modern single-board devices are ready, and the correlation between using different optimization approaches, types of single-board platforms, and model inference speed analyses. Results and Discussion. All tested models are pretrained using the MS COCO dataset (80 classes). All models were prepared for the experiment with 8-bit full integer quantization and output-TFLite model generation using TensorFlow Object Detection API Docker images and Python 3.11. The testing data samples are obtained from the MS COCO validation dataset archive. The size of the image input is 640x640 RGB. The comparison of image recognition time to 640x640 RGB was conducted on Raspberry Pi 5, Raspberry Pi 4, and Jetson Nano 2GB. Only the Raspberry Pi 5 target device achieved real-time execution (100 ms at most or one fps) as it has more CPU performance than other devices. Conclusion. Confirmation of the real-time execution approach was achieved by using reference models with reduced image sizes (320x320 RGB). TensorFlow standard model Zoo models, compiled with the TensorRT compiler, were used for the Jetson Nano target as an NPU-optimized case. Real-time execution (100 ms at most or one fps) is reaching for most models and target devices. Such an approach is suitable for less powerful devices with ARM Cortex-A processors.
format	Article
id	doaj-art-9eefe1cf3b0f4e83916a2a3cac4b3c0d
institution	Kabale University
issn	2224-087X 2224-0888
language	English
publishDate	2025-03-01
publisher	Ivan Franko National University of Lviv
record_format	Article
series	Електроніка та інформаційні технології
spelling	doaj-art-9eefe1cf3b0f4e83916a2a3cac4b3c0d2025-08-20T03:27:22ZengIvan Franko National University of LvivЕлектроніка та інформаційні технології2224-087X2224-08882025-03-0129576810.30970/eli.29.6OPTIMIZATIONS OF DEEP LEARNING OBJECTS DETECTION MODELS FOR INFERENCE ACCELERATION ON GENERAL-PURPOSE AND HARDWARE-ACCELERATED SINGLE-BOARD PLATFORMSDmytro Myroniuk0https://orcid.org/0009-0001-5634-9354Bohdan Blahitko1https://orcid.org/0000-0002-0516-9353Ivan Franko National University of LvivIvan Franko National University of LvivBackground. Description and preparation of modern approaches for deep learning object detection models are provided. Deep learning frameworks for model training and inference, such as TensorFlow and TensorFlow Lite, are used as bases. The concepts of deep learning model optimization are analyzed. Materials and Methods. The quantized int8 models are used as a baseline for optimization effectiveness estimation. The delegation approach includes software or hardware-optimized variants of neural operations. It prepared to speed up the inference process on target devices. The device with reduced performance resources or microcontroller without floating-point blocks uses a case of base-optimization model with int8 weights. The TensorFlow Lite framework has various quantization types outlined in a detailed explanation. Benchmarks for modern single-board devices are ready, and the correlation between using different optimization approaches, types of single-board platforms, and model inference speed analyses. Results and Discussion. All tested models are pretrained using the MS COCO dataset (80 classes). All models were prepared for the experiment with 8-bit full integer quantization and output-TFLite model generation using TensorFlow Object Detection API Docker images and Python 3.11. The testing data samples are obtained from the MS COCO validation dataset archive. The size of the image input is 640x640 RGB. The comparison of image recognition time to 640x640 RGB was conducted on Raspberry Pi 5, Raspberry Pi 4, and Jetson Nano 2GB. Only the Raspberry Pi 5 target device achieved real-time execution (100 ms at most or one fps) as it has more CPU performance than other devices. Conclusion. Confirmation of the real-time execution approach was achieved by using reference models with reduced image sizes (320x320 RGB). TensorFlow standard model Zoo models, compiled with the TensorRT compiler, were used for the Jetson Nano target as an NPU-optimized case. Real-time execution (100 ms at most or one fps) is reaching for most models and target devices. Such an approach is suitable for less powerful devices with ARM Cortex-A processors.http://publications.lnu.edu.ua/collections/index.php/electronics/article/view/4782single-board computersmodelingbenchmarkingneural networksobject detectionoptimization
spellingShingle	Dmytro Myroniuk Bohdan Blahitko OPTIMIZATIONS OF DEEP LEARNING OBJECTS DETECTION MODELS FOR INFERENCE ACCELERATION ON GENERAL-PURPOSE AND HARDWARE-ACCELERATED SINGLE-BOARD PLATFORMS Електроніка та інформаційні технології single-board computers modeling benchmarking neural networks object detection optimization
title	OPTIMIZATIONS OF DEEP LEARNING OBJECTS DETECTION MODELS FOR INFERENCE ACCELERATION ON GENERAL-PURPOSE AND HARDWARE-ACCELERATED SINGLE-BOARD PLATFORMS
title_full	OPTIMIZATIONS OF DEEP LEARNING OBJECTS DETECTION MODELS FOR INFERENCE ACCELERATION ON GENERAL-PURPOSE AND HARDWARE-ACCELERATED SINGLE-BOARD PLATFORMS
title_fullStr	OPTIMIZATIONS OF DEEP LEARNING OBJECTS DETECTION MODELS FOR INFERENCE ACCELERATION ON GENERAL-PURPOSE AND HARDWARE-ACCELERATED SINGLE-BOARD PLATFORMS
title_full_unstemmed	OPTIMIZATIONS OF DEEP LEARNING OBJECTS DETECTION MODELS FOR INFERENCE ACCELERATION ON GENERAL-PURPOSE AND HARDWARE-ACCELERATED SINGLE-BOARD PLATFORMS
title_short	OPTIMIZATIONS OF DEEP LEARNING OBJECTS DETECTION MODELS FOR INFERENCE ACCELERATION ON GENERAL-PURPOSE AND HARDWARE-ACCELERATED SINGLE-BOARD PLATFORMS
title_sort	optimizations of deep learning objects detection models for inference acceleration on general purpose and hardware accelerated single board platforms
topic	single-board computers modeling benchmarking neural networks object detection optimization
url	http://publications.lnu.edu.ua/collections/index.php/electronics/article/view/4782
work_keys_str_mv	AT dmytromyroniuk optimizationsofdeeplearningobjectsdetectionmodelsforinferenceaccelerationongeneralpurposeandhardwareacceleratedsingleboardplatforms AT bohdanblahitko optimizationsofdeeplearningobjectsdetectionmodelsforinferenceaccelerationongeneralpurposeandhardwareacceleratedsingleboardplatforms

OPTIMIZATIONS OF DEEP LEARNING OBJECTS DETECTION MODELS FOR INFERENCE ACCELERATION ON GENERAL-PURPOSE AND HARDWARE-ACCELERATED SINGLE-BOARD PLATFORMS

Similar Items