Fourier or Wavelet bases as counterpart self-attention in spikformer for efficient visual classification
Energy-efficient spikformer has been proposed by integrating the biologically plausible spiking neural network (SNN) and artificial transformer, whereby the spiking self-attention (SSA) is used to achieve both higher accuracy and lower computational cost. However, it seems that self-attention is not...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Frontiers Media S.A.
2025-01-01
|
Series: | Frontiers in Neuroscience |
Subjects: | |
Online Access: | https://www.frontiersin.org/articles/10.3389/fnins.2024.1516868/full |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832582855929102336 |
---|---|
author | Qingyu Wang Qingyu Wang Duzhen Zhang Xinyuan Cai Tielin Zhang Bo Xu |
author_facet | Qingyu Wang Qingyu Wang Duzhen Zhang Xinyuan Cai Tielin Zhang Bo Xu |
author_sort | Qingyu Wang |
collection | DOAJ |
description | Energy-efficient spikformer has been proposed by integrating the biologically plausible spiking neural network (SNN) and artificial transformer, whereby the spiking self-attention (SSA) is used to achieve both higher accuracy and lower computational cost. However, it seems that self-attention is not always necessary, especially in sparse spike-form calculation manners. In this article, we innovatively replace vanilla SSA (using dynamic bases calculating from Query and Key) with spike-form Fourier transform, wavelet transform, and their combinations (using fixed triangular or wavelets bases), based on a key hypothesis that both of them use a set of basis functions for information transformation. Hence, the Fourier-or-Wavelet-based spikformer (FWformer) is proposed and verified in visual classification tasks, including both static image and event-based video datasets. The FWformer can achieve comparable or even higher accuracies (0.4%–1.5%), higher running speed (9%–51% for training and 19%–70% for inference), reduced theoretical energy consumption (20%–25%), and reduced graphic processing unit (GPU) memory usage (4%–26%), compared to the standard spikformer. Our result indicates the continuous refinement of new transformers that are inspired either by biological discovery (spike-form), or information theory (Fourier or Wavelet transform), is promising. |
format | Article |
id | doaj-art-1b87288d6bd247bbbf33061fb3404d81 |
institution | Kabale University |
issn | 1662-453X |
language | English |
publishDate | 2025-01-01 |
publisher | Frontiers Media S.A. |
record_format | Article |
series | Frontiers in Neuroscience |
spelling | doaj-art-1b87288d6bd247bbbf33061fb3404d812025-01-29T06:46:15ZengFrontiers Media S.A.Frontiers in Neuroscience1662-453X2025-01-011810.3389/fnins.2024.15168681516868Fourier or Wavelet bases as counterpart self-attention in spikformer for efficient visual classificationQingyu Wang0Qingyu Wang1Duzhen Zhang2Xinyuan Cai3Tielin Zhang4Bo Xu5Institute of Automation, Chinese Academy of Sciences, Beijing, ChinaSchool of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, ChinaInstitute of Automation, Chinese Academy of Sciences, Beijing, ChinaInstitute of Automation, Chinese Academy of Sciences, Beijing, ChinaCenter for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, ChinaInstitute of Automation, Chinese Academy of Sciences, Beijing, ChinaEnergy-efficient spikformer has been proposed by integrating the biologically plausible spiking neural network (SNN) and artificial transformer, whereby the spiking self-attention (SSA) is used to achieve both higher accuracy and lower computational cost. However, it seems that self-attention is not always necessary, especially in sparse spike-form calculation manners. In this article, we innovatively replace vanilla SSA (using dynamic bases calculating from Query and Key) with spike-form Fourier transform, wavelet transform, and their combinations (using fixed triangular or wavelets bases), based on a key hypothesis that both of them use a set of basis functions for information transformation. Hence, the Fourier-or-Wavelet-based spikformer (FWformer) is proposed and verified in visual classification tasks, including both static image and event-based video datasets. The FWformer can achieve comparable or even higher accuracies (0.4%–1.5%), higher running speed (9%–51% for training and 19%–70% for inference), reduced theoretical energy consumption (20%–25%), and reduced graphic processing unit (GPU) memory usage (4%–26%), compared to the standard spikformer. Our result indicates the continuous refinement of new transformers that are inspired either by biological discovery (spike-form), or information theory (Fourier or Wavelet transform), is promising.https://www.frontiersin.org/articles/10.3389/fnins.2024.1516868/fullspiking neural networktransformerFourier/Wavelet transformvisual classificationcomputational efficiency |
spellingShingle | Qingyu Wang Qingyu Wang Duzhen Zhang Xinyuan Cai Tielin Zhang Bo Xu Fourier or Wavelet bases as counterpart self-attention in spikformer for efficient visual classification Frontiers in Neuroscience spiking neural network transformer Fourier/Wavelet transform visual classification computational efficiency |
title | Fourier or Wavelet bases as counterpart self-attention in spikformer for efficient visual classification |
title_full | Fourier or Wavelet bases as counterpart self-attention in spikformer for efficient visual classification |
title_fullStr | Fourier or Wavelet bases as counterpart self-attention in spikformer for efficient visual classification |
title_full_unstemmed | Fourier or Wavelet bases as counterpart self-attention in spikformer for efficient visual classification |
title_short | Fourier or Wavelet bases as counterpart self-attention in spikformer for efficient visual classification |
title_sort | fourier or wavelet bases as counterpart self attention in spikformer for efficient visual classification |
topic | spiking neural network transformer Fourier/Wavelet transform visual classification computational efficiency |
url | https://www.frontiersin.org/articles/10.3389/fnins.2024.1516868/full |
work_keys_str_mv | AT qingyuwang fourierorwaveletbasesascounterpartselfattentioninspikformerforefficientvisualclassification AT qingyuwang fourierorwaveletbasesascounterpartselfattentioninspikformerforefficientvisualclassification AT duzhenzhang fourierorwaveletbasesascounterpartselfattentioninspikformerforefficientvisualclassification AT xinyuancai fourierorwaveletbasesascounterpartselfattentioninspikformerforefficientvisualclassification AT tielinzhang fourierorwaveletbasesascounterpartselfattentioninspikformerforefficientvisualclassification AT boxu fourierorwaveletbasesascounterpartselfattentioninspikformerforefficientvisualclassification |