HiEndo: harnessing large-scale data for generating high-resolution laparoscopy videos under a two-stage framework
Recent success in generative AI has demonstrated great potential in various medical scenarios. However, how to generate realistic and high-fidelity gastrointestinal laparoscopy videos still lacks exploration. A recent work, Endora, proposes a basic generation model for a gastrointestinal laparoscopy...
Saved in:
| Main Authors: | , , , , , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Taylor & Francis Group
2025-12-01
|
| Series: | Computer Assisted Surgery |
| Subjects: | |
| Online Access: | https://www.tandfonline.com/doi/10.1080/24699322.2025.2536643 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849419014691880960 |
|---|---|
| author | Zhao Wang Yeqian Zhang Jiayi Gu Yueyao Chen Yonghao Long Xiang Xia Puhua Zhang Chunchao Zhu Zerui Wang Qi Dou Zheng Wang Zizhen Zhang |
| author_facet | Zhao Wang Yeqian Zhang Jiayi Gu Yueyao Chen Yonghao Long Xiang Xia Puhua Zhang Chunchao Zhu Zerui Wang Qi Dou Zheng Wang Zizhen Zhang |
| author_sort | Zhao Wang |
| collection | DOAJ |
| description | Recent success in generative AI has demonstrated great potential in various medical scenarios. However, how to generate realistic and high-fidelity gastrointestinal laparoscopy videos still lacks exploration. A recent work, Endora, proposes a basic generation model for a gastrointestinal laparoscopy scenario, producing low-resolution laparoscopy videos, which can not meet the real needs in robotic surgery. Regarding this issue, we propose an innovative two-stage video generation architecture HiEndo for generating high-resolution gastrointestinal laparoscopy videos with high fidelity. In the first stage, we build a diffusion transformer for generating a low-resolution laparoscopy video upon the basic capability of Endora as an initial start. In the second stage, we further design a super resolution module to improve the resolution of initial video and refine the fine-grained details. With these two stages, we could obtain high-resolution realistic laparoscopy videos with high fidelity, which can meet the real-world clinical usage. We also collect a large-scale gastrointestinal laparoscopy video dataset with 61,270 video clips for training and validation of our proposed method. Extensive experimental results have demonstrate the effectiveness of our proposed framework. For example, our model achieves 15.1% Fréchet Video Distance and 3.7% F1 score improvements compared with the previous state-of-the-art method. |
| format | Article |
| id | doaj-art-03a573266ff34e588d401ad4ac217e21 |
| institution | Kabale University |
| issn | 2469-9322 |
| language | English |
| publishDate | 2025-12-01 |
| publisher | Taylor & Francis Group |
| record_format | Article |
| series | Computer Assisted Surgery |
| spelling | doaj-art-03a573266ff34e588d401ad4ac217e212025-08-20T03:32:16ZengTaylor & Francis GroupComputer Assisted Surgery2469-93222025-12-0130110.1080/24699322.2025.2536643HiEndo: harnessing large-scale data for generating high-resolution laparoscopy videos under a two-stage frameworkZhao Wang0Yeqian Zhang1Jiayi Gu2Yueyao Chen3Yonghao Long4Xiang Xia5Puhua Zhang6Chunchao Zhu7Zerui Wang8Qi Dou9Zheng Wang10Zizhen Zhang11Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, ChinaDepartment of Gastrointestinal Surgery, Renji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, ChinaDepartment of Gastrointestinal Surgery, Renji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, ChinaDepartment of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, ChinaDepartment of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, ChinaDepartment of Gastrointestinal Surgery, Renji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, ChinaDepartment of Gastrointestinal Surgery, Renji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, ChinaDepartment of Gastrointestinal Surgery, Renji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, ChinaCornerstone Robotics Ltd, HKSAR, ChinaDepartment of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, ChinaDepartment of Gastrointestinal Surgery, Renji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, ChinaDepartment of Gastrointestinal Surgery, Renji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, ChinaRecent success in generative AI has demonstrated great potential in various medical scenarios. However, how to generate realistic and high-fidelity gastrointestinal laparoscopy videos still lacks exploration. A recent work, Endora, proposes a basic generation model for a gastrointestinal laparoscopy scenario, producing low-resolution laparoscopy videos, which can not meet the real needs in robotic surgery. Regarding this issue, we propose an innovative two-stage video generation architecture HiEndo for generating high-resolution gastrointestinal laparoscopy videos with high fidelity. In the first stage, we build a diffusion transformer for generating a low-resolution laparoscopy video upon the basic capability of Endora as an initial start. In the second stage, we further design a super resolution module to improve the resolution of initial video and refine the fine-grained details. With these two stages, we could obtain high-resolution realistic laparoscopy videos with high fidelity, which can meet the real-world clinical usage. We also collect a large-scale gastrointestinal laparoscopy video dataset with 61,270 video clips for training and validation of our proposed method. Extensive experimental results have demonstrate the effectiveness of our proposed framework. For example, our model achieves 15.1% Fréchet Video Distance and 3.7% F1 score improvements compared with the previous state-of-the-art method.https://www.tandfonline.com/doi/10.1080/24699322.2025.2536643Generative AIvideo generationlaparoscopy |
| spellingShingle | Zhao Wang Yeqian Zhang Jiayi Gu Yueyao Chen Yonghao Long Xiang Xia Puhua Zhang Chunchao Zhu Zerui Wang Qi Dou Zheng Wang Zizhen Zhang HiEndo: harnessing large-scale data for generating high-resolution laparoscopy videos under a two-stage framework Computer Assisted Surgery Generative AI video generation laparoscopy |
| title | HiEndo: harnessing large-scale data for generating high-resolution laparoscopy videos under a two-stage framework |
| title_full | HiEndo: harnessing large-scale data for generating high-resolution laparoscopy videos under a two-stage framework |
| title_fullStr | HiEndo: harnessing large-scale data for generating high-resolution laparoscopy videos under a two-stage framework |
| title_full_unstemmed | HiEndo: harnessing large-scale data for generating high-resolution laparoscopy videos under a two-stage framework |
| title_short | HiEndo: harnessing large-scale data for generating high-resolution laparoscopy videos under a two-stage framework |
| title_sort | hiendo harnessing large scale data for generating high resolution laparoscopy videos under a two stage framework |
| topic | Generative AI video generation laparoscopy |
| url | https://www.tandfonline.com/doi/10.1080/24699322.2025.2536643 |
| work_keys_str_mv | AT zhaowang hiendoharnessinglargescaledataforgeneratinghighresolutionlaparoscopyvideosunderatwostageframework AT yeqianzhang hiendoharnessinglargescaledataforgeneratinghighresolutionlaparoscopyvideosunderatwostageframework AT jiayigu hiendoharnessinglargescaledataforgeneratinghighresolutionlaparoscopyvideosunderatwostageframework AT yueyaochen hiendoharnessinglargescaledataforgeneratinghighresolutionlaparoscopyvideosunderatwostageframework AT yonghaolong hiendoharnessinglargescaledataforgeneratinghighresolutionlaparoscopyvideosunderatwostageframework AT xiangxia hiendoharnessinglargescaledataforgeneratinghighresolutionlaparoscopyvideosunderatwostageframework AT puhuazhang hiendoharnessinglargescaledataforgeneratinghighresolutionlaparoscopyvideosunderatwostageframework AT chunchaozhu hiendoharnessinglargescaledataforgeneratinghighresolutionlaparoscopyvideosunderatwostageframework AT zeruiwang hiendoharnessinglargescaledataforgeneratinghighresolutionlaparoscopyvideosunderatwostageframework AT qidou hiendoharnessinglargescaledataforgeneratinghighresolutionlaparoscopyvideosunderatwostageframework AT zhengwang hiendoharnessinglargescaledataforgeneratinghighresolutionlaparoscopyvideosunderatwostageframework AT zizhenzhang hiendoharnessinglargescaledataforgeneratinghighresolutionlaparoscopyvideosunderatwostageframework |