COIN: A Large-scale Dataset for Comprehensive Instructional Video Analysis

7 March 2019

Jie Zhou

Papers citing "COIN: A Large-scale Dataset for Comprehensive Instructional Video Analysis"

50 / 73 papers shown

Title
Temporal-Oriented Recipe for Transferring Large Vision-Language Model to Video Understanding Thong Nguyen Zhiyuan Hu Xu Lin Cong-Duy Nguyen See-Kiong Ng Luu Anh Tuan VLM 22 0 0 19 May 2025
StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant Haibo Wang Bo Feng Zhengfeng Lai Mingze Xu Shiyu Li Weifeng Ge Afshin Dehghan Meng Cao Ping Huang OffRL 62 0 0 08 May 2025
Memory-efficient Streaming VideoLLMs for Real-time Procedural Video Understanding Dibyadip Chatterjee Edoardo Remelli Yale Song Bugra Tekin Abhay Mittal ... Shreyas Hampali Eric Sauser Shugao Ma Angela Yao Fadime Sener VLM 51 0 0 10 Apr 2025
Neuro Symbolic Knowledge Reasoning for Procedural Video Question Answering Thanh-Son Nguyen Hong Yang Tzeh Yuan Neoh Hao Zhang Ee Yeo Keat Basura Fernando NAI 64 0 0 19 Mar 2025
Stitch-a-Recipe: Video Demonstration from Multistep Descriptions Chi Hsuan Wu Kumar Ashutosh Kristen Grauman DiffM 68 0 0 18 Mar 2025
HierarQ: Task-Aware Hierarchical Q-Former for Enhanced Video Understanding Shehreen Azad Vibhav Vineet Yogesh S Rawat VLM 211 1 0 11 Mar 2025
StreamMind: Unlocking Full Frame Rate Streaming Video Dialogue through Event-Gated Cognition Xin Ding Hao Wu Yue Yang Shiqi Jiang Donglin Bai Zhibo Chen Ting Cao 211 0 0 08 Mar 2025
Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented Generation Mohammad Mahdi Abootorabi Amirhosein Zobeiri Mahdi Dehghani Mohammadali Mohammadkhani Bardia Mohammadi Omid Ghahroodi M. Baghshah Ehsaneddin Asgari RALM 105 5 0 12 Feb 2025
Do Language Models Understand Time? Xi Ding Lei Wang 184 0 0 18 Dec 2024
Progress-Aware Video Frame Captioning Zihui Xue Joungbin An Xitong Yang Kristen Grauman 108 1 0 03 Dec 2024
TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning Xiangyu Zeng Kunchang Li Chenting Wang Xinhao Li Tianxiang Jiang ... Zhengrong Yue Yi Wang Yali Wang Yu Qiao Limin Wang MLLM VLM AI4TS 71 15 0 25 Oct 2024
Bridging Today and the Future of Humanity: AI Safety in 2024 and Beyond Shanshan Han 87 1 0 09 Oct 2024
TRACE: Temporal Grounding Video LLM via Causal Event Modeling Yongxin Guo Jingyu Liu Mingda Li Xiaoying Tang Qingbin Liu Xiaoying Tang 49 14 0 08 Oct 2024
Enhancing Long Video Understanding via Hierarchical Event-Based Memory Dingxin Cheng Mingda Li Jingyu Liu Yongxin Guo Bin Jiang Qingbin Liu Xi Chen Bo Zhao 38 4 0 10 Sep 2024
ExpertAF: Expert Actionable Feedback from Video Kumar Ashutosh Tushar Nagarajan Georgios Pavlakos Kris Kitani Kristen Grauman VGen 44 2 0 01 Aug 2024
Tarsier: Recipes for Training and Evaluating Large Video Description Models Jiawei Wang Liping Yuan Yuchen Zhang 49 52 0 30 Jun 2024
GUIDE: A Guideline-Guided Dataset for Instructional Video Comprehension Jiafeng Liang Shixin Jiang Zekun Wang Haojie Pan Zerui Chen Zheng Chu Ming Liu Ruiji Fu Zhongyuan Wang Bing Qin 34 2 0 26 Jun 2024
Mitigating the Human-Robot Domain Discrepancy in Visual Pre-training for Robotic Manipulation Jiaming Zhou Teli Ma Kun-Yu Lin Ronghe Qiu Zifan Wang Junwei Liang 57 5 0 20 Jun 2024
Video-Language Understanding: A Survey from Model Architecture, Model Training, and Data Perspectives Thong Nguyen Yi Bin Junbin Xiao Leigang Qu Yicong Li Jay Zhangjie Wu Cong-Duy Nguyen See-Kiong Ng Luu Anh Tuan VLM 61 10 1 09 Jun 2024
Mamba-360: Survey of State Space Models as Transformer Alternative for Long Sequence Modelling: Methods, Applications, and Challenges Badri N. Patro Vijay Srinivas Agneeswaran Mamba 48 38 0 24 Apr 2024
Video ReCap: Recursive Captioning of Hour-Long Videos Md. Mohaiminul Islam Ngan Ho Xitong Yang Tushar Nagarajan Lorenzo Torresani Gedas Bertasius VGen VLM 35 47 0 20 Feb 2024
VideoPrism: A Foundational Visual Encoder for Video Understanding Long Zhao N. B. Gundavarapu Liangzhe Yuan Hao Zhou Shen Yan ... Huisheng Wang Hartwig Adam Mikhail Sirotenko Ting Liu Boqing Gong VGen 50 29 0 20 Feb 2024
Learning to Visually Connect Actions and their Effects Eric Peh Paritosh Parmar Basura Fernando 24 2 0 19 Jan 2024
Spacewalk-18: A Benchmark for Multimodal and Long-form Procedural Video Understanding in Novel Domains Rohan Myer Krishnan Zitian Tang Zhiqiu Yu Chen Sun 59 1 0 30 Nov 2023
Towards Weakly Supervised End-to-end Learning for Long-video Action Recognition Jiaming Zhou Hanjun Li Kun-Yu Lin Junwei Liang 29 1 0 28 Nov 2023
Linguistically Motivated Sign Language Segmentation Amit Moryossef Zifan Jiang Mathias Müller Sarah Ebling Yoav Goldberg SLR 29 5 0 21 Oct 2023
UnLoc: A Unified Framework for Video Localization Tasks Shengjia Yan Xuehan Xiong Arsha Nagrani Anurag Arnab Zhonghao Wang Weina Ge David A. Ross Cordelia Schmid 36 53 0 21 Aug 2023
AntGPT: Can Large Language Models Help Long-term Action Anticipation from Videos? Qi Zhao Shijie Wang Ce Zhang Changcheng Fu Minh Quan Do Nakul Agarwal Kwonjoon Lee Chen Sun LM&Ro 61 49 0 31 Jul 2023
Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures Kun Yuan V. Srivastav Tong Yu Joël L. Lavanchy Pietro Mascagni Pietro Mascagni N. Padoy Nicolas Padoy 37 20 0 27 Jul 2023
Action Anticipation with Goal Consistency Olga Zatsarynna Juergen Gall 30 10 0 26 Jun 2023
Learning Fine-grained View-Invariant Representations from Unpaired Ego-Exo Videos via Temporal Alignment Zihui Xue Kristen Grauman EgoV 43 31 0 08 Jun 2023
Visual Transformation Telling Wanqing Cui Mustafa Nasir-Moin Yanyan Lan Viola J. Chen J. Guo Xueqi Cheng LRM 67 1 0 03 May 2023
Visual Reasoning: from State to Transformation Xin Hong Yanyan Lan Liang Pang J. Guo Xueqi Cheng LRM 27 4 0 02 May 2023
DIR-AS: Decoupling Individual Identification and Temporal Reasoning for Action Segmentation Peiyao Wang Haibin Ling 15 2 0 04 Apr 2023
Procedure-Aware Pretraining for Instructional Video Understanding Honglu Zhou Roberto Martín-Martín Mubbasir Kapadia Silvio Savarese Juan Carlos Niebles 41 39 0 31 Mar 2023
What, when, and where? -- Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions Brian Chen Nina Shvetsova Andrew Rouditchenko D. Kondermann Samuel Thomas Shih-Fu Chang Rogerio Feris James R. Glass Hilde Kuehne 40 7 0 29 Mar 2023
Hierarchical Video-Moment Retrieval and Step-Captioning Abhaysinh Zala Jaemin Cho Satwik Kottur Xilun Chen Barlas Ouguz Yasher Mehdad Joey Tianyi Zhou 3DV 20 51 0 29 Mar 2023
PDPP: Projected Diffusion for Procedure Planning in Instructional Videos Hanlin Wang Yilu Wu Sheng Guo Limin Wang VGen DiffM 78 30 0 26 Mar 2023
HierVL: Learning Hierarchical Video-Language Embeddings Kumar Ashutosh Rohit Girdhar Lorenzo Torresani Kristen Grauman VLM AI4TS 28 53 0 05 Jan 2023
What You Say Is What You Show: Visual Narration Detection in Instructional Videos Kumar Ashutosh Rohit Girdhar Lorenzo Torresani Kristen Grauman 29 4 0 05 Jan 2023
Ego-Only: Egocentric Action Detection without Exocentric Transferring Huiyu Wang Mitesh Singh Lorenzo Torresani EgoV 77 24 0 03 Jan 2023
OpenPack: A Large-scale Dataset for Recognizing Packaging Works in IoT-enabled Logistic Environments Naoya Yoshimura Jaime Morales T. Maekawa Takahiro Hara 30 19 0 10 Dec 2022
Tencent AVS: A Holistic Ads Video Dataset for Multi-modal Scene Segmentation Jie Jiang Zhimin Li Jiangfeng Xiong Rongwei Quan Qinglin Lu Wei Liu 36 2 0 09 Dec 2022
Multi-Task Learning of Object State Changes from Uncurated Videos Tomávs Souvcek Jean-Baptiste Alayrac Antoine Miech Ivan Laptev Josef Sivic 41 11 0 24 Nov 2022
Human in the loop approaches in multi-modal conversational task guidance system development R. Manuvinakurike Sovan Biswas G. Raffa R. Beckwith A. Rhodes Meng Shi Gesem Gudino Mejia Saurav Sahay L. Nachman 38 2 0 03 Nov 2022
Temporal Action Segmentation: An Analysis of Modern Techniques Guodong Ding Fadime Sener Angela Yao 49 75 0 19 Oct 2022
Robust Action Segmentation from Timestamp Supervision Yaser Souri Yazan Abu Farha Emad Bahrami Gianpiero Francesca Juergen Gall 27 6 0 12 Oct 2022
Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning Yuchong Sun Hongwei Xue Ruihua Song Bei Liu Huan Yang Jianlong Fu AI4TS VLM 20 68 0 12 Oct 2022
A Closer Look at Temporal Ordering in the Segmentation of Instructional Videos Anil Batra Shreyank N. Gowda Frank Keller Laura Sevilla-Lara 44 5 0 30 Sep 2022
SVGraph: Learning Semantic Graphs from Instructional Videos Madeline Chantry Schiappa Yogesh S Rawat 17 4 0 16 Jul 2022