Benchmarking CNN Architectures for Tool Classification: Evaluating CNN Performance on a Unique Dataset Generated by Novel Image Acquisition System
In this study, we introduce the ToolSurface-144 dataset, which is presented here for the first time. It comprises four subsets – Full R, Full S, Top R, and Top S – each containing 144 tool classes captured under varying illumination conditions and fields of view. A newly develo...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11017576/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | In this study, we introduce the ToolSurface-144 dataset, which is presented here for the first time. It comprises four subsets – Full R, Full S, Top R, and Top S – each containing 144 tool classes captured under varying illumination conditions and fields of view. A newly developed, patented imaging approach was employed to acquire the data. It is compared with conventional diffuse ring illumination to assess its effectiveness in evaluating state-of-the-art convolutional neural networks. This enabled a more targeted investigation of the role of global shape characteristics such as silhouettes versus localized features like the tool face, cutting edges, and delicate geometrical structures under different training strategies. In this study, we evaluate six state-of-the-art convolutional neural networks—AlexNet, DenseNet161, EfficientNet-B0, ResNet152, ResNet50, and VGG16—using three training strategies: fine-tuning, freezing of pre-trained layers, and training from scratch. The results show that EfficientNet-B0 consistently achieved the highest classification accuracy in nearly all experiments and data sets. Especially using the fine-tuning training strategy, the model achieved 99% accuracy in tool classification. ResNet50 benefited greatly from fine-tuning and freezing, achieving a significant increase in performance compared to training from scratch. In contrast, ResNet152, AlexNet, and VGG16 consistently showed poor classification performance, indicating difficulties regarding learning and generalisation. The results show that diffuse illumination and complete tool views provide the best classification conditions, while restricted image sections with homogeneous illumination negatively affect model performance. Among the evaluated training strategies, fine-tuning proved the most efficient training method for developing CNN models for tool classification. |
|---|---|
| ISSN: | 2169-3536 |