TOMFuN: A tensorized optical multimodal fusion network
This paper proposes a real-size, single-shot, high-speed, and energy-efficient tensorized optical multimodal fusion network (TOMFuN) on an electro-photonic large-scale III–V-on-Si in-memory compute engine. The TOMFuN architecture leverages a memory-efficient and low-complexity self-attention for the...
Saved in:
| Main Authors: | , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
AIP Publishing LLC
2025-03-01
|
| Series: | APL Machine Learning |
| Online Access: | http://dx.doi.org/10.1063/5.0255883 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | This paper proposes a real-size, single-shot, high-speed, and energy-efficient tensorized optical multimodal fusion network (TOMFuN) on an electro-photonic large-scale III–V-on-Si in-memory compute engine. The TOMFuN architecture leverages a memory-efficient and low-complexity self-attention for the embedding network for the text information and tensor-train and CANDECOMP/PARAFAC decompositions for compressing the model parameters in the large-scale fully connected layers. Compared to full-size counterparts, our proposed network maintains a compatible inference accuracy in multimodal sentiment analysis tasks while requiring 92.8× fewer model parameters and 51.3× fewer hardware resources. Furthermore, the impact of photonic device imperfections on the TOMFuN architecture is investigated. The simulation results show that noise-aware on-chip training exhibits superior robustness. Finally, chip performance analysis shows that our TOMFuN inference accelerator has 230.73 PetaOps computational speed, 6.51 TOPS/W power efficiency, and 2.7 µs latency with the input dimensions of 1024. |
|---|---|
| ISSN: | 2770-9019 |