What You Say Is What You Show: Visual Narration Detection in
Instructional Videos

What You Say Is What You Show: Visual Narration Detection in Instructional Videos

5 January 2023

Lorenzo Torresani

Kristen Grauman

Papers citing "What You Say Is What You Show: Visual Narration Detection in Instructional Videos"

13 / 13 papers shown

Title
ExpertAF: Expert Actionable Feedback from Video Kumar Ashutosh Tushar Nagarajan Georgios Pavlakos Kris M. Kitani Kristen Grauman VGen 44 2 0 01 Aug 2024
Video Editing for Video Retrieval Bin Zhu Kevin Flanagan A. Fragomeni Michael Wray Dima Damen CLIP 31 0 0 04 Feb 2024
Detours for Navigating Instructional Videos Kumar Ashutosh Zihui Xue Tushar Nagarajan Kristen Grauman 23 6 0 03 Jan 2024
Video-Mined Task Graphs for Keystep Recognition in Instructional Videos Kumar Ashutosh Santhosh Kumar Ramakrishnan Triantafyllos Afouras Kristen Grauman 23 23 0 17 Jul 2023
Shaping embodied agent behavior with activity-context priors from egocentric video Tushar Nagarajan Kristen Grauman EgoV LM&Ro 43 13 0 14 Oct 2021
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding Hu Xu Gargi Ghosh Po-Yao (Bernie) Huang Dmytro Okhonko Armen Aghajanyan Florian Metze Luke Zettlemoyer Florian Metze Luke Zettlemoyer Christoph Feichtenhofer CLIP VLM 259 558 0 28 Sep 2021
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text Hassan Akbari Liangzhe Yuan Rui Qian Wei-Hong Chuang Shih-Fu Chang Yin Cui Boqing Gong ViT 248 577 0 22 Apr 2021
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval Huaishao Luo Lei Ji Ming Zhong Yang Chen Wen Lei Nan Duan Tianrui Li CLIP VLM 317 780 0 18 Apr 2021
Forecasting Action through Contact Representations from First Person Video Eadom Dessalene Chinmaya Devaraj Michael Maynord Cornelia Fermuller Yiannis Aloimonos EgoV 58 60 0 01 Feb 2021
Multi-modal Transformer for Video Retrieval Valentin Gabeur Chen Sun Alahari Karteek Cordelia Schmid ViT 415 595 0 21 Jul 2020
Multi-task Collaborative Network for Joint Referring Expression Comprehension and Segmentation Gen Luo Yiyi Zhou Xiaoshuai Sun Liujuan Cao Chenglin Wu Cheng Deng Rongrong Ji ObjD 164 286 0 19 Mar 2020
Neural Modular Control for Embodied Question Answering Abhishek Das Georgia Gkioxari Stefan Lee Devi Parikh Dhruv Batra LM&Ro 132 127 0 26 Oct 2018
Efficient Estimation of Word Representations in Vector Space Tomáš Mikolov Kai Chen G. Corrado J. Dean 3DV 233 31,253 0 16 Jan 2013