TOMFuN: A tensorized optical multimodal fusion network

This paper proposes a real-size, single-shot, high-speed, and energy-efficient tensorized optical multimodal fusion network (TOMFuN) on an electro-photonic large-scale III–V-on-Si in-memory compute engine. The TOMFuN architecture leverages a memory-efficient and low-complexity self-attention for the...

Full description

Saved in:
Bibliographic Details
Main Authors: Xian Xiao, Yequan Zhao, Yuan Yuan, Geza Kurczveil, Marco Fiorentino, Ray Beausoleil, Zheng Zhang
Format: Article
Language:English
Published: AIP Publishing LLC 2025-03-01
Series:APL Machine Learning
Online Access:http://dx.doi.org/10.1063/5.0255883
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This paper proposes a real-size, single-shot, high-speed, and energy-efficient tensorized optical multimodal fusion network (TOMFuN) on an electro-photonic large-scale III–V-on-Si in-memory compute engine. The TOMFuN architecture leverages a memory-efficient and low-complexity self-attention for the embedding network for the text information and tensor-train and CANDECOMP/PARAFAC decompositions for compressing the model parameters in the large-scale fully connected layers. Compared to full-size counterparts, our proposed network maintains a compatible inference accuracy in multimodal sentiment analysis tasks while requiring 92.8× fewer model parameters and 51.3× fewer hardware resources. Furthermore, the impact of photonic device imperfections on the TOMFuN architecture is investigated. The simulation results show that noise-aware on-chip training exhibits superior robustness. Finally, chip performance analysis shows that our TOMFuN inference accelerator has 230.73 PetaOps computational speed, 6.51 TOPS/W power efficiency, and 2.7 µs latency with the input dimensions of 1024.
ISSN:2770-9019