ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2401.11644
21
0

Friends Across Time: Multi-Scale Action Segmentation Transformer for Surgical Phase Recognition

22 January 2024
Bokai Zhang
Jiayuan Meng
Bin Cheng
Dean Biskup
Svetlana Petculescu
Angela Chapman
    ViT
    MedIm
ArXivPDFHTML
Abstract

Automatic surgical phase recognition is a core technology for modern operating rooms and online surgical video assessment platforms. Current state-of-the-art methods use both spatial and temporal information to tackle the surgical phase recognition task. Building on this idea, we propose the Multi-Scale Action Segmentation Transformer (MS-AST) for offline surgical phase recognition and the Multi-Scale Action Segmentation Causal Transformer (MS-ASCT) for online surgical phase recognition. We use ResNet50 or EfficientNetV2-M for spatial feature extraction. Our MS-AST and MS-ASCT can model temporal information at different scales with multi-scale temporal self-attention and multi-scale temporal cross-attention, which enhances the capture of temporal relationships between frames and segments. We demonstrate that our method can achieve 95.26% and 96.15% accuracy on the Cholec80 dataset for online and offline surgical phase recognition, respectively, which achieves new state-of-the-art results. Our method can also achieve state-of-the-art results on non-medical datasets in the video action segmentation domain.

View on arXiv
Comments on this paper