Video-Mined Task Graphs for Keystep Recognition in Instructional Videos

Video-Mined Task Graphs for Keystep Recognition in Instructional Videos

17 July 2023

Santhosh Kumar Ramakrishnan

Triantafyllos Afouras

Kristen Grauman

Papers citing "Video-Mined Task Graphs for Keystep Recognition in Instructional Videos"

16 / 16 papers shown

Title
Memory-efficient Streaming VideoLLMs for Real-time Procedural Video Understanding Dibyadip Chatterjee Edoardo Remelli Yale Song Bugra Tekin Abhay Mittal ... Shreyas Hampali Eric Sauser Shugao Ma Angela Yao Fadime Sener VLM 46 0 0 10 Apr 2025
Neuro Symbolic Knowledge Reasoning for Procedural Video Question Answering Thanh-Son Nguyen Hong Yang Tzeh Yuan Neoh Hao Zhang Ee Yeo Keat Basura Fernando NAI 59 0 0 19 Mar 2025
Task Graph Maximum Likelihood Estimation for Procedural Activity Understanding in Egocentric Videos Luigi Seminara G. Farinella Antonino Furnari 77 0 0 25 Feb 2025
Differentiable Task Graph Learning: Procedural Activity Representation and Online Mistake Detection from Egocentric Videos Luigi Seminara G. Farinella Antonino Furnari 64 7 0 10 Jan 2025
Leveraging Surgical Activity Grammar for Primary Intention Prediction in Laparoscopy Procedures Jie Zhang Song Zhou Yiwei Wang Chidan Wan Huan Zhao Xiong Cai Han Ding 28 0 0 29 Sep 2024
ExpertAF: Expert Actionable Feedback from Video Kumar Ashutosh Tushar Nagarajan Georgios Pavlakos Kris M. Kitani Kristen Grauman VGen 44 2 0 01 Aug 2024
PDPP: Projected Diffusion for Procedure Planning in Instructional Videos Hanlin Wang Yilu Wu Sheng Guo Limin Wang VGen DiffM 73 30 0 26 Mar 2023
Learning and Verification of Task Structure in Instructional Videos Medhini Narasimhan Licheng Yu Sean Bell Ning Zhang Trevor Darrell 68 19 0 23 Mar 2023
STEPs: Self-Supervised Key Step Extraction and Localization from Unlabeled Procedural Videos Anshul B. Shah Benjamin Lundell H. Sawhney Ramalingam Chellappa SSL 16 8 0 02 Jan 2023
UniFormer: Unifying Convolution and Self-attention for Visual Recognition Kunchang Li Yali Wang Junhao Zhang Peng Gao Guanglu Song Yu Liu Hongsheng Li Yu Qiao ViT 162 360 0 24 Jan 2022
Omnivore: A Single Model for Many Visual Modalities Rohit Girdhar Mannat Singh Nikhil Ravi L. V. D. van der Maaten Armand Joulin Ishan Misra 226 225 0 20 Jan 2022
Shaping embodied agent behavior with activity-context priors from egocentric video Tushar Nagarajan Kristen Grauman EgoV LM&Ro 43 13 0 14 Oct 2021
Ego4D: Around the World in 3,000 Hours of Egocentric Video Kristen Grauman Andrew Westbury Eugene Byrne Zachary Chavis Antonino Furnari ... Mike Zheng Shou Antonio Torralba Lorenzo Torresani Mingfei Yan Jitendra Malik EgoV 241 1,024 0 13 Oct 2021
Procedure Planning in Instructional Videos via Contextual Modeling and Model-based Policy Learning Jing Bi Jiebo Luo Chenliang Xu 76 48 0 05 Oct 2021
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding Hu Xu Gargi Ghosh Po-Yao (Bernie) Huang Dmytro Okhonko Armen Aghajanyan Florian Metze Luke Zettlemoyer Florian Metze Luke Zettlemoyer Christoph Feichtenhofer CLIP VLM 259 558 0 28 Sep 2021
Is Space-Time Attention All You Need for Video Understanding? Gedas Bertasius Heng Wang Lorenzo Torresani ViT 280 1,982 0 09 Feb 2021