Spatial-temporal attention for video-based assessment of intraoperative surgical skill

Abstract Accurate, unbiased, and reproducible assessment of skill is a vital resource for surgeons throughout their career. The objective in this research is to develop and validate algorithms for video-based assessment of intraoperative surgical skill. Algorithms to classify surgical video into exp...

Full description

Saved in:
Bibliographic Details
Main Authors: Bohua Wan, Michael Peven, Gregory Hager, Shameema Sikder, S. Swaroop Vedula
Format: Article
Language:English
Published: Nature Portfolio 2024-11-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-024-77176-1
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850061774498299904
author Bohua Wan
Michael Peven
Gregory Hager
Shameema Sikder
S. Swaroop Vedula
author_facet Bohua Wan
Michael Peven
Gregory Hager
Shameema Sikder
S. Swaroop Vedula
author_sort Bohua Wan
collection DOAJ
description Abstract Accurate, unbiased, and reproducible assessment of skill is a vital resource for surgeons throughout their career. The objective in this research is to develop and validate algorithms for video-based assessment of intraoperative surgical skill. Algorithms to classify surgical video into expert or novice categories provide a summative assessment of skill, which is useful for evaluating surgeons at discrete time points in their training or certification of surgeons. Using a spatial-temporal neural network architecture, we tested the hypothesis that explicit supervision of spatial attention supervised by instrument tip locations improves the algorithm’s generalizability to unseen dataset. The best performing model had an area under the receiver operating characteristic curve (AUC) of 0.88. Augmenting the network with supervision of spatial attention improved specificity of its predictions (with small changes in sensitivity and AUC) and led to improved measures of discrimination when tested with unseen dataset. Our findings show that explicit supervision of attention learned from images using instrument tip locations can improve performance of algorithms for objective video-based assessment of surgical skill.
format Article
id doaj-art-c1bb6a79f1b74b5c8b3e8487449cc31f
institution DOAJ
issn 2045-2322
language English
publishDate 2024-11-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-c1bb6a79f1b74b5c8b3e8487449cc31f2025-08-20T02:50:07ZengNature PortfolioScientific Reports2045-23222024-11-0114111410.1038/s41598-024-77176-1Spatial-temporal attention for video-based assessment of intraoperative surgical skillBohua Wan0Michael Peven1Gregory Hager2Shameema Sikder3S. Swaroop Vedula4Department of Computer Science, Whiting School of Engineering, Johns Hopkins UniversityDepartment of Computer Science, Whiting School of Engineering, Johns Hopkins UniversityDepartment of Computer Science, Whiting School of Engineering, Johns Hopkins UniversityMalone Center for Engineering in Healthcare, Johns Hopkins UniversityMalone Center for Engineering in Healthcare, Johns Hopkins UniversityAbstract Accurate, unbiased, and reproducible assessment of skill is a vital resource for surgeons throughout their career. The objective in this research is to develop and validate algorithms for video-based assessment of intraoperative surgical skill. Algorithms to classify surgical video into expert or novice categories provide a summative assessment of skill, which is useful for evaluating surgeons at discrete time points in their training or certification of surgeons. Using a spatial-temporal neural network architecture, we tested the hypothesis that explicit supervision of spatial attention supervised by instrument tip locations improves the algorithm’s generalizability to unseen dataset. The best performing model had an area under the receiver operating characteristic curve (AUC) of 0.88. Augmenting the network with supervision of spatial attention improved specificity of its predictions (with small changes in sensitivity and AUC) and led to improved measures of discrimination when tested with unseen dataset. Our findings show that explicit supervision of attention learned from images using instrument tip locations can improve performance of algorithms for objective video-based assessment of surgical skill.https://doi.org/10.1038/s41598-024-77176-1
spellingShingle Bohua Wan
Michael Peven
Gregory Hager
Shameema Sikder
S. Swaroop Vedula
Spatial-temporal attention for video-based assessment of intraoperative surgical skill
Scientific Reports
title Spatial-temporal attention for video-based assessment of intraoperative surgical skill
title_full Spatial-temporal attention for video-based assessment of intraoperative surgical skill
title_fullStr Spatial-temporal attention for video-based assessment of intraoperative surgical skill
title_full_unstemmed Spatial-temporal attention for video-based assessment of intraoperative surgical skill
title_short Spatial-temporal attention for video-based assessment of intraoperative surgical skill
title_sort spatial temporal attention for video based assessment of intraoperative surgical skill
url https://doi.org/10.1038/s41598-024-77176-1
work_keys_str_mv AT bohuawan spatialtemporalattentionforvideobasedassessmentofintraoperativesurgicalskill
AT michaelpeven spatialtemporalattentionforvideobasedassessmentofintraoperativesurgicalskill
AT gregoryhager spatialtemporalattentionforvideobasedassessmentofintraoperativesurgicalskill
AT shameemasikder spatialtemporalattentionforvideobasedassessmentofintraoperativesurgicalskill
AT sswaroopvedula spatialtemporalattentionforvideobasedassessmentofintraoperativesurgicalskill