TOMFuN: A tensorized optical multimodal fusion network

This paper proposes a real-size, single-shot, high-speed, and energy-efficient tensorized optical multimodal fusion network (TOMFuN) on an electro-photonic large-scale III–V-on-Si in-memory compute engine. The TOMFuN architecture leverages a memory-efficient and low-complexity self-attention for the...

Full description

Saved in:
Bibliographic Details
Main Authors: Xian Xiao, Yequan Zhao, Yuan Yuan, Geza Kurczveil, Marco Fiorentino, Ray Beausoleil, Zheng Zhang
Format: Article
Language:English
Published: AIP Publishing LLC 2025-03-01
Series:APL Machine Learning
Online Access:http://dx.doi.org/10.1063/5.0255883
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850259634099585024
author Xian Xiao
Yequan Zhao
Yuan Yuan
Geza Kurczveil
Marco Fiorentino
Ray Beausoleil
Zheng Zhang
author_facet Xian Xiao
Yequan Zhao
Yuan Yuan
Geza Kurczveil
Marco Fiorentino
Ray Beausoleil
Zheng Zhang
author_sort Xian Xiao
collection DOAJ
description This paper proposes a real-size, single-shot, high-speed, and energy-efficient tensorized optical multimodal fusion network (TOMFuN) on an electro-photonic large-scale III–V-on-Si in-memory compute engine. The TOMFuN architecture leverages a memory-efficient and low-complexity self-attention for the embedding network for the text information and tensor-train and CANDECOMP/PARAFAC decompositions for compressing the model parameters in the large-scale fully connected layers. Compared to full-size counterparts, our proposed network maintains a compatible inference accuracy in multimodal sentiment analysis tasks while requiring 92.8× fewer model parameters and 51.3× fewer hardware resources. Furthermore, the impact of photonic device imperfections on the TOMFuN architecture is investigated. The simulation results show that noise-aware on-chip training exhibits superior robustness. Finally, chip performance analysis shows that our TOMFuN inference accelerator has 230.73 PetaOps computational speed, 6.51 TOPS/W power efficiency, and 2.7 µs latency with the input dimensions of 1024.
format Article
id doaj-art-d98cd23513dc4e15b704c77c5031360e
institution OA Journals
issn 2770-9019
language English
publishDate 2025-03-01
publisher AIP Publishing LLC
record_format Article
series APL Machine Learning
spelling doaj-art-d98cd23513dc4e15b704c77c5031360e2025-08-20T01:55:49ZengAIP Publishing LLCAPL Machine Learning2770-90192025-03-0131016121016121-1510.1063/5.0255883TOMFuN: A tensorized optical multimodal fusion networkXian Xiao0Yequan Zhao1Yuan Yuan2Geza Kurczveil3Marco Fiorentino4Ray Beausoleil5Zheng Zhang6Hewlett Packard Labs, Hewlett Packard Enterprise, 820 N. McCarthy Blvd., Milpitas, California 95305, USADepartment of Electrical and Computer Engineering, University of California, Santa Barbara, California 93106, USAHewlett Packard Labs, Hewlett Packard Enterprise, 820 N. McCarthy Blvd., Milpitas, California 95305, USAHewlett Packard Labs, Hewlett Packard Enterprise, 820 N. McCarthy Blvd., Milpitas, California 95305, USAHewlett Packard Labs, Hewlett Packard Enterprise, 820 N. McCarthy Blvd., Milpitas, California 95305, USAHewlett Packard Labs, Hewlett Packard Enterprise, 820 N. McCarthy Blvd., Milpitas, California 95305, USADepartment of Electrical and Computer Engineering, University of California, Santa Barbara, California 93106, USAThis paper proposes a real-size, single-shot, high-speed, and energy-efficient tensorized optical multimodal fusion network (TOMFuN) on an electro-photonic large-scale III–V-on-Si in-memory compute engine. The TOMFuN architecture leverages a memory-efficient and low-complexity self-attention for the embedding network for the text information and tensor-train and CANDECOMP/PARAFAC decompositions for compressing the model parameters in the large-scale fully connected layers. Compared to full-size counterparts, our proposed network maintains a compatible inference accuracy in multimodal sentiment analysis tasks while requiring 92.8× fewer model parameters and 51.3× fewer hardware resources. Furthermore, the impact of photonic device imperfections on the TOMFuN architecture is investigated. The simulation results show that noise-aware on-chip training exhibits superior robustness. Finally, chip performance analysis shows that our TOMFuN inference accelerator has 230.73 PetaOps computational speed, 6.51 TOPS/W power efficiency, and 2.7 µs latency with the input dimensions of 1024.http://dx.doi.org/10.1063/5.0255883
spellingShingle Xian Xiao
Yequan Zhao
Yuan Yuan
Geza Kurczveil
Marco Fiorentino
Ray Beausoleil
Zheng Zhang
TOMFuN: A tensorized optical multimodal fusion network
APL Machine Learning
title TOMFuN: A tensorized optical multimodal fusion network
title_full TOMFuN: A tensorized optical multimodal fusion network
title_fullStr TOMFuN: A tensorized optical multimodal fusion network
title_full_unstemmed TOMFuN: A tensorized optical multimodal fusion network
title_short TOMFuN: A tensorized optical multimodal fusion network
title_sort tomfun a tensorized optical multimodal fusion network
url http://dx.doi.org/10.1063/5.0255883
work_keys_str_mv AT xianxiao tomfunatensorizedopticalmultimodalfusionnetwork
AT yequanzhao tomfunatensorizedopticalmultimodalfusionnetwork
AT yuanyuan tomfunatensorizedopticalmultimodalfusionnetwork
AT gezakurczveil tomfunatensorizedopticalmultimodalfusionnetwork
AT marcofiorentino tomfunatensorizedopticalmultimodalfusionnetwork
AT raybeausoleil tomfunatensorizedopticalmultimodalfusionnetwork
AT zhengzhang tomfunatensorizedopticalmultimodalfusionnetwork