HiEndo: harnessing large-scale data for generating high-resolution laparoscopy videos under a two-stage framework

Recent success in generative AI has demonstrated great potential in various medical scenarios. However, how to generate realistic and high-fidelity gastrointestinal laparoscopy videos still lacks exploration. A recent work, Endora, proposes a basic generation model for a gastrointestinal laparoscopy...

Full description

Saved in:
Bibliographic Details
Main Authors: Zhao Wang, Yeqian Zhang, Jiayi Gu, Yueyao Chen, Yonghao Long, Xiang Xia, Puhua Zhang, Chunchao Zhu, Zerui Wang, Qi Dou, Zheng Wang, Zizhen Zhang
Format: Article
Language:English
Published: Taylor & Francis Group 2025-12-01
Series:Computer Assisted Surgery
Subjects:
Online Access:https://www.tandfonline.com/doi/10.1080/24699322.2025.2536643
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849419014691880960
author Zhao Wang
Yeqian Zhang
Jiayi Gu
Yueyao Chen
Yonghao Long
Xiang Xia
Puhua Zhang
Chunchao Zhu
Zerui Wang
Qi Dou
Zheng Wang
Zizhen Zhang
author_facet Zhao Wang
Yeqian Zhang
Jiayi Gu
Yueyao Chen
Yonghao Long
Xiang Xia
Puhua Zhang
Chunchao Zhu
Zerui Wang
Qi Dou
Zheng Wang
Zizhen Zhang
author_sort Zhao Wang
collection DOAJ
description Recent success in generative AI has demonstrated great potential in various medical scenarios. However, how to generate realistic and high-fidelity gastrointestinal laparoscopy videos still lacks exploration. A recent work, Endora, proposes a basic generation model for a gastrointestinal laparoscopy scenario, producing low-resolution laparoscopy videos, which can not meet the real needs in robotic surgery. Regarding this issue, we propose an innovative two-stage video generation architecture HiEndo for generating high-resolution gastrointestinal laparoscopy videos with high fidelity. In the first stage, we build a diffusion transformer for generating a low-resolution laparoscopy video upon the basic capability of Endora as an initial start. In the second stage, we further design a super resolution module to improve the resolution of initial video and refine the fine-grained details. With these two stages, we could obtain high-resolution realistic laparoscopy videos with high fidelity, which can meet the real-world clinical usage. We also collect a large-scale gastrointestinal laparoscopy video dataset with 61,270 video clips for training and validation of our proposed method. Extensive experimental results have demonstrate the effectiveness of our proposed framework. For example, our model achieves 15.1% Fréchet Video Distance and 3.7% F1 score improvements compared with the previous state-of-the-art method.
format Article
id doaj-art-03a573266ff34e588d401ad4ac217e21
institution Kabale University
issn 2469-9322
language English
publishDate 2025-12-01
publisher Taylor & Francis Group
record_format Article
series Computer Assisted Surgery
spelling doaj-art-03a573266ff34e588d401ad4ac217e212025-08-20T03:32:16ZengTaylor & Francis GroupComputer Assisted Surgery2469-93222025-12-0130110.1080/24699322.2025.2536643HiEndo: harnessing large-scale data for generating high-resolution laparoscopy videos under a two-stage frameworkZhao Wang0Yeqian Zhang1Jiayi Gu2Yueyao Chen3Yonghao Long4Xiang Xia5Puhua Zhang6Chunchao Zhu7Zerui Wang8Qi Dou9Zheng Wang10Zizhen Zhang11Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, ChinaDepartment of Gastrointestinal Surgery, Renji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, ChinaDepartment of Gastrointestinal Surgery, Renji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, ChinaDepartment of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, ChinaDepartment of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, ChinaDepartment of Gastrointestinal Surgery, Renji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, ChinaDepartment of Gastrointestinal Surgery, Renji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, ChinaDepartment of Gastrointestinal Surgery, Renji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, ChinaCornerstone Robotics Ltd, HKSAR, ChinaDepartment of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, ChinaDepartment of Gastrointestinal Surgery, Renji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, ChinaDepartment of Gastrointestinal Surgery, Renji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, ChinaRecent success in generative AI has demonstrated great potential in various medical scenarios. However, how to generate realistic and high-fidelity gastrointestinal laparoscopy videos still lacks exploration. A recent work, Endora, proposes a basic generation model for a gastrointestinal laparoscopy scenario, producing low-resolution laparoscopy videos, which can not meet the real needs in robotic surgery. Regarding this issue, we propose an innovative two-stage video generation architecture HiEndo for generating high-resolution gastrointestinal laparoscopy videos with high fidelity. In the first stage, we build a diffusion transformer for generating a low-resolution laparoscopy video upon the basic capability of Endora as an initial start. In the second stage, we further design a super resolution module to improve the resolution of initial video and refine the fine-grained details. With these two stages, we could obtain high-resolution realistic laparoscopy videos with high fidelity, which can meet the real-world clinical usage. We also collect a large-scale gastrointestinal laparoscopy video dataset with 61,270 video clips for training and validation of our proposed method. Extensive experimental results have demonstrate the effectiveness of our proposed framework. For example, our model achieves 15.1% Fréchet Video Distance and 3.7% F1 score improvements compared with the previous state-of-the-art method.https://www.tandfonline.com/doi/10.1080/24699322.2025.2536643Generative AIvideo generationlaparoscopy
spellingShingle Zhao Wang
Yeqian Zhang
Jiayi Gu
Yueyao Chen
Yonghao Long
Xiang Xia
Puhua Zhang
Chunchao Zhu
Zerui Wang
Qi Dou
Zheng Wang
Zizhen Zhang
HiEndo: harnessing large-scale data for generating high-resolution laparoscopy videos under a two-stage framework
Computer Assisted Surgery
Generative AI
video generation
laparoscopy
title HiEndo: harnessing large-scale data for generating high-resolution laparoscopy videos under a two-stage framework
title_full HiEndo: harnessing large-scale data for generating high-resolution laparoscopy videos under a two-stage framework
title_fullStr HiEndo: harnessing large-scale data for generating high-resolution laparoscopy videos under a two-stage framework
title_full_unstemmed HiEndo: harnessing large-scale data for generating high-resolution laparoscopy videos under a two-stage framework
title_short HiEndo: harnessing large-scale data for generating high-resolution laparoscopy videos under a two-stage framework
title_sort hiendo harnessing large scale data for generating high resolution laparoscopy videos under a two stage framework
topic Generative AI
video generation
laparoscopy
url https://www.tandfonline.com/doi/10.1080/24699322.2025.2536643
work_keys_str_mv AT zhaowang hiendoharnessinglargescaledataforgeneratinghighresolutionlaparoscopyvideosunderatwostageframework
AT yeqianzhang hiendoharnessinglargescaledataforgeneratinghighresolutionlaparoscopyvideosunderatwostageframework
AT jiayigu hiendoharnessinglargescaledataforgeneratinghighresolutionlaparoscopyvideosunderatwostageframework
AT yueyaochen hiendoharnessinglargescaledataforgeneratinghighresolutionlaparoscopyvideosunderatwostageframework
AT yonghaolong hiendoharnessinglargescaledataforgeneratinghighresolutionlaparoscopyvideosunderatwostageframework
AT xiangxia hiendoharnessinglargescaledataforgeneratinghighresolutionlaparoscopyvideosunderatwostageframework
AT puhuazhang hiendoharnessinglargescaledataforgeneratinghighresolutionlaparoscopyvideosunderatwostageframework
AT chunchaozhu hiendoharnessinglargescaledataforgeneratinghighresolutionlaparoscopyvideosunderatwostageframework
AT zeruiwang hiendoharnessinglargescaledataforgeneratinghighresolutionlaparoscopyvideosunderatwostageframework
AT qidou hiendoharnessinglargescaledataforgeneratinghighresolutionlaparoscopyvideosunderatwostageframework
AT zhengwang hiendoharnessinglargescaledataforgeneratinghighresolutionlaparoscopyvideosunderatwostageframework
AT zizhenzhang hiendoharnessinglargescaledataforgeneratinghighresolutionlaparoscopyvideosunderatwostageframework