AI-Empowered Anticipation of Surgical Triplets
in Laparoscopic Videos for Enhanced Decision Support
Currently Anonymous

Abstract
The development of intra-operative context-aware decision support systems is crucial for enhancing decision-making and analysing surgical workflow. We introduce SurgAnt, a diffusion-based model designed to aid decision-making during surgery by predicting action triplets several frames ahead based on laparoscopic video footage, where each action triplet is composed by: <Instrument, Verb, Target>.
We evaluated our model on laparoscopic videos performed by professional surgeons. The results demonstrated that our model can accurately predict the next triplets in most cases, highlighting its potential to assist operating surgeons during surgery. Our model can also serve as the training tool for medical interns and can be integrated into other computer-assisted intervention (CAI) systems.

Technical Overview

Surgical Triplets

Triplet Instument Action Target
Number of Categories 100 6 10 15
Components grasper grasp gallbladder
bipolar retract cystic-plate
hook dissect cystic-duct
scissors coagulate cystic-artery
clipper clip cystic-pedicle
irrigator cut blood-vessel
aspirate fluid
irrigate abdominal-wall/cavity
pack liver
null-verb adhesion
omentum
peritoneum
gut
specimen-bag
null-target

Module Structure

DEMO

Frame 1

Description for Frame 1

Frame 2

Description for Frame 2

Acknowledgements

The website template was borrowed from VIDM.