TOMFuN: A tensorized optical multimodal fusion network
This paper proposes a real-size, single-shot, high-speed, and energy-efficient tensorized optical multimodal fusion network (TOMFuN) on an electro-photonic large-scale III–V-on-Si in-memory compute engine. The TOMFuN architecture leverages a memory-efficient and low-complexity self-attention for the...
Saved in:
| Main Authors: | , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
AIP Publishing LLC
2025-03-01
|
| Series: | APL Machine Learning |
| Online Access: | http://dx.doi.org/10.1063/5.0255883 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850259634099585024 |
|---|---|
| author | Xian Xiao Yequan Zhao Yuan Yuan Geza Kurczveil Marco Fiorentino Ray Beausoleil Zheng Zhang |
| author_facet | Xian Xiao Yequan Zhao Yuan Yuan Geza Kurczveil Marco Fiorentino Ray Beausoleil Zheng Zhang |
| author_sort | Xian Xiao |
| collection | DOAJ |
| description | This paper proposes a real-size, single-shot, high-speed, and energy-efficient tensorized optical multimodal fusion network (TOMFuN) on an electro-photonic large-scale III–V-on-Si in-memory compute engine. The TOMFuN architecture leverages a memory-efficient and low-complexity self-attention for the embedding network for the text information and tensor-train and CANDECOMP/PARAFAC decompositions for compressing the model parameters in the large-scale fully connected layers. Compared to full-size counterparts, our proposed network maintains a compatible inference accuracy in multimodal sentiment analysis tasks while requiring 92.8× fewer model parameters and 51.3× fewer hardware resources. Furthermore, the impact of photonic device imperfections on the TOMFuN architecture is investigated. The simulation results show that noise-aware on-chip training exhibits superior robustness. Finally, chip performance analysis shows that our TOMFuN inference accelerator has 230.73 PetaOps computational speed, 6.51 TOPS/W power efficiency, and 2.7 µs latency with the input dimensions of 1024. |
| format | Article |
| id | doaj-art-d98cd23513dc4e15b704c77c5031360e |
| institution | OA Journals |
| issn | 2770-9019 |
| language | English |
| publishDate | 2025-03-01 |
| publisher | AIP Publishing LLC |
| record_format | Article |
| series | APL Machine Learning |
| spelling | doaj-art-d98cd23513dc4e15b704c77c5031360e2025-08-20T01:55:49ZengAIP Publishing LLCAPL Machine Learning2770-90192025-03-0131016121016121-1510.1063/5.0255883TOMFuN: A tensorized optical multimodal fusion networkXian Xiao0Yequan Zhao1Yuan Yuan2Geza Kurczveil3Marco Fiorentino4Ray Beausoleil5Zheng Zhang6Hewlett Packard Labs, Hewlett Packard Enterprise, 820 N. McCarthy Blvd., Milpitas, California 95305, USADepartment of Electrical and Computer Engineering, University of California, Santa Barbara, California 93106, USAHewlett Packard Labs, Hewlett Packard Enterprise, 820 N. McCarthy Blvd., Milpitas, California 95305, USAHewlett Packard Labs, Hewlett Packard Enterprise, 820 N. McCarthy Blvd., Milpitas, California 95305, USAHewlett Packard Labs, Hewlett Packard Enterprise, 820 N. McCarthy Blvd., Milpitas, California 95305, USAHewlett Packard Labs, Hewlett Packard Enterprise, 820 N. McCarthy Blvd., Milpitas, California 95305, USADepartment of Electrical and Computer Engineering, University of California, Santa Barbara, California 93106, USAThis paper proposes a real-size, single-shot, high-speed, and energy-efficient tensorized optical multimodal fusion network (TOMFuN) on an electro-photonic large-scale III–V-on-Si in-memory compute engine. The TOMFuN architecture leverages a memory-efficient and low-complexity self-attention for the embedding network for the text information and tensor-train and CANDECOMP/PARAFAC decompositions for compressing the model parameters in the large-scale fully connected layers. Compared to full-size counterparts, our proposed network maintains a compatible inference accuracy in multimodal sentiment analysis tasks while requiring 92.8× fewer model parameters and 51.3× fewer hardware resources. Furthermore, the impact of photonic device imperfections on the TOMFuN architecture is investigated. The simulation results show that noise-aware on-chip training exhibits superior robustness. Finally, chip performance analysis shows that our TOMFuN inference accelerator has 230.73 PetaOps computational speed, 6.51 TOPS/W power efficiency, and 2.7 µs latency with the input dimensions of 1024.http://dx.doi.org/10.1063/5.0255883 |
| spellingShingle | Xian Xiao Yequan Zhao Yuan Yuan Geza Kurczveil Marco Fiorentino Ray Beausoleil Zheng Zhang TOMFuN: A tensorized optical multimodal fusion network APL Machine Learning |
| title | TOMFuN: A tensorized optical multimodal fusion network |
| title_full | TOMFuN: A tensorized optical multimodal fusion network |
| title_fullStr | TOMFuN: A tensorized optical multimodal fusion network |
| title_full_unstemmed | TOMFuN: A tensorized optical multimodal fusion network |
| title_short | TOMFuN: A tensorized optical multimodal fusion network |
| title_sort | tomfun a tensorized optical multimodal fusion network |
| url | http://dx.doi.org/10.1063/5.0255883 |
| work_keys_str_mv | AT xianxiao tomfunatensorizedopticalmultimodalfusionnetwork AT yequanzhao tomfunatensorizedopticalmultimodalfusionnetwork AT yuanyuan tomfunatensorizedopticalmultimodalfusionnetwork AT gezakurczveil tomfunatensorizedopticalmultimodalfusionnetwork AT marcofiorentino tomfunatensorizedopticalmultimodalfusionnetwork AT raybeausoleil tomfunatensorizedopticalmultimodalfusionnetwork AT zhengzhang tomfunatensorizedopticalmultimodalfusionnetwork |