Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1903.08225
Cited By
v1
v2 (latest)
Cross-task weakly supervised learning from instructional videos
19 March 2019
Dimitri Zhukov
Jean-Baptiste Alayrac
R. G. Cinbis
David Fouhey
Ivan Laptev
Josef Sivic
SSL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Cross-task weakly supervised learning from instructional videos"
50 / 174 papers shown
Title
Why Not Use Your Textbook? Knowledge-Enhanced Procedure Planning of Instructional Videos
Kumaranage Ravindu Yasas Nagasinghe
Honglu Zhou
Malitha Gunawardhana
Martin Renqiang Min
Daniel Harari
Muhammad Haris Khan
102
7
0
05 Mar 2024
SCHEMA: State CHangEs MAtter for Procedure Planning in Instructional Videos
Yulei Niu
Wenliang Guo
Long Chen
Xudong Lin
Shih-Fu Chang
87
12
0
03 Mar 2024
CI w/o TN: Context Injection without Task Name for Procedure Planning
Xinjie Li
65
0
0
23 Feb 2024
Video ReCap: Recursive Captioning of Hour-Long Videos
Md. Mohaiminul Islam
Ngan Ho
Xitong Yang
Tushar Nagarajan
Lorenzo Torresani
Gedas Bertasius
VGen
VLM
99
50
0
20 Feb 2024
FineBio: A Fine-Grained Video Dataset of Biological Experiments with Hierarchical Annotation
Takuma Yagi
Misaki Ohashi
Yifei Huang
Ryosuke Furuta
Shungo Adachi
Toutai Mitsuyama
Yoichi Sato
72
6
0
01 Feb 2024
Detours for Navigating Instructional Videos
Kumar Ashutosh
Zihui Xue
Tushar Nagarajan
Kristen Grauman
125
6
0
03 Jan 2024
CaptainCook4D: A dataset for understanding errors in procedural activities
Rohith Peddi
Shivvrat Arya
B. Challa
Likhitha Pallapothula
Akshay Vyas
...
Vasundhara Komaragiri
Eric D. Ragan
Nicholas Ruozzi
Yu Xiang
Vibhav Gogate
102
14
0
22 Dec 2023
A Strong Baseline for Temporal Video-Text Alignment
Zeqian Li
Qirui Chen
Tengda Han
Ya Zhang
Yanfeng Wang
Weidi Xie
AI4TS
VGen
88
5
0
21 Dec 2023
EgoPlan-Bench: Benchmarking Multimodal Large Language Models for Human-Level Planning
Yi Chen
Yuying Ge
Yixiao Ge
Mingyu Ding
Bohao Li
Rui Wang
Rui-Lan Xu
Ying Shan
Xihui Liu
LLMAG
ELM
LRM
94
13
0
11 Dec 2023
Spacewalk-18: A Benchmark for Multimodal and Long-form Procedural Video Understanding in Novel Domains
Rohan Myer Krishnan
Zitian Tang
Zhiqiu Yu
Chen Sun
144
2
0
30 Nov 2023
Exo2EgoDVC: Dense Video Captioning of Egocentric Procedural Activities Using Web Instructional Videos
Takehiko Ohkawa
Takuma Yagi
Taichi Nishimura
Ryosuke Furuta
Atsushi Hashimoto
Yoshitaka Ushiku
Yoichi Sato
EgoV
73
10
0
28 Nov 2023
Efficient Pre-training for Localized Instruction Generation of Videos
Anil Batra
Davide Moltisanti
Laura Sevilla-Lara
Marcus Rohrbach
Frank Keller
70
0
0
27 Nov 2023
United We Stand, Divided We Fall: UnityGraph for Unsupervised Procedure Learning from Videos
Siddhant Bansal
Chetan Arora
C. V. Jawahar
108
6
0
06 Nov 2023
ProBio: A Protocol-guided Multimodal Dataset for Molecular Biology Lab
Jieming Cui
Ziren Gong
Baoxiong Jia
Siyuan Huang
Zilong Zheng
Jianzhu Ma
Yixin Zhu
96
3
0
01 Nov 2023
IndustReal: A Dataset for Procedure Step Recognition Handling Execution Errors in Egocentric Videos in an Industrial-Like Setting
Tim J. Schoonbeek
Tim Houben
H. Onvlee
Peter H. N. de With
Fons van der Sommen
115
24
0
26 Oct 2023
Skip-Plan: Procedure Planning in Instructional Videos via Condensed Action Space Learning
Zhiheng Li
Wenjia Geng
Muheng Li
Lei Chen
Yansong Tang
Jiwen Lu
Jie Zhou
74
10
0
01 Oct 2023
Video-adverb retrieval with compositional adverb-action embeddings
Thomas Hummel
Otniel-Bogdan Mercea
A. Sophia Koepke
Zeynep Akata
66
1
0
26 Sep 2023
ENIGMA-51: Towards a Fine-Grained Understanding of Human-Object Interactions in Industrial Scenarios
Francesco Ragusa
Rosario Leonardi
Michele Mazzamuto
Claudia Bonanno
Rosario Scavo
Antonino Furnari
G. Farinella
67
7
0
26 Sep 2023
Chop & Learn: Recognizing and Generating Object-State Compositions
Nirat Saini
Hanyu Wang
Archana Swaminathan
Vinoj Jayasundara
Bo He
Kamal Gupta
Abhinav Shrivastava
CoGe
76
12
0
25 Sep 2023
Masked Diffusion with Task-awareness for Procedure Planning in Instructional Videos
Fen Fang
Yun Liu
Ali Koksal
Qianli Xu
Joo-Hwee Lim
VGen
DiffM
71
6
0
14 Sep 2023
BIT: Bi-Level Temporal Modeling for Efficient Supervised Action Segmentation
Zijia Lu
Ehsan Elhamifar
76
2
0
28 Aug 2023
Are current long-term video understanding datasets long-term?
Ombretta Strafforello
Klamer Schutte
Jan van Gemert
54
8
0
22 Aug 2023
Event-Guided Procedure Planning from Instructional Videos with Text Supervision
Ante Wang
Kun-Li Channing Lin
Jiachen Du
Jingke Meng
Wei-Shi Zheng
67
16
0
17 Aug 2023
Every Mistake Counts in Assembly
Guodong Ding
Fadime Sener
Shugao Ma
Angela Yao
67
13
0
31 Jul 2023
AntGPT: Can Large Language Models Help Long-term Action Anticipation from Videos?
Qi Zhao
Shijie Wang
Ce Zhang
Changcheng Fu
Minh Quan Do
Nakul Agarwal
Kwonjoon Lee
Chen Sun
LM&Ro
126
51
0
31 Jul 2023
Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures
Kun Yuan
V. Srivastav
Tong Yu
Joël L. Lavanchy
J. Marescaux
Pietro Mascagni
Nassir Navab
N. Padoy
186
23
0
27 Jul 2023
Video-Mined Task Graphs for Keystep Recognition in Instructional Videos
Kumar Ashutosh
Santhosh Kumar Ramakrishnan
Triantafyllos Afouras
Kristen Grauman
123
25
0
17 Jul 2023
Exploring the Role of Audio in Video Captioning
Yuhan Shen
Linjie Yang
Longyin Wen
Haichao Yu
Ehsan Elhamifar
Heng Wang
65
2
0
21 Jun 2023
Learning Fine-grained View-Invariant Representations from Unpaired Ego-Exo Videos via Temporal Alignment
Zihui Xue
Kristen Grauman
EgoV
105
40
0
08 Jun 2023
Learning to Ground Instructional Articles in Videos through Narrations
E. Mavroudi
Triantafyllos Afouras
Lorenzo Torresani
DiffM
85
24
0
06 Jun 2023
Non-Sequential Graph Script Induction via Multimedia Grounding
Yu Zhou
Sha Li
Manling Li
Xudong Lin
Shih-Fu Chang
Joey Tianyi Zhou
Heng Ji
64
8
0
27 May 2023
Visual Transformation Telling
Wanqing Cui
Mustafa Nasir-Moin
Yanyan Lan
Viola J. Chen
Jiafeng Guo
Xueqi Cheng
LRM
108
1
0
03 May 2023
StepFormer: Self-supervised Step Discovery and Localization in Instructional Videos
Nikita Dvornik
Isma Hadji
Ran Zhang
Konstantinos G. Derpanis
Animesh Garg
Richard P. Wildes
Allan D. Jepson
82
27
0
26 Apr 2023
Pretrained Language Models as Visual Planners for Human Assistance
Dhruvesh Patel
H. Eghbalzadeh
Nitin Kamra
Michael L. Iuzzolino
Unnat Jain
Ruta Desai
LM&Ro
87
25
0
17 Apr 2023
Procedure-Aware Pretraining for Instructional Video Understanding
Honglu Zhou
Roberto Martín-Martín
Mubbasir Kapadia
Silvio Savarese
Juan Carlos Niebles
123
40
0
31 Mar 2023
Learning Procedure-aware Video Representation from Instructional Videos and Their Narrations
Yiwu Zhong
Licheng Yu
Yang Bai
Shangwen Li
Xueting Yan
Yin Li
AI4TS
103
34
0
31 Mar 2023
What, when, and where? -- Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions
Brian Chen
Nina Shvetsova
Andrew Rouditchenko
D. Kondermann
Samuel Thomas
Shih-Fu Chang
Rogerio Feris
James R. Glass
Hilde Kuehne
112
7
0
29 Mar 2023
Hierarchical Video-Moment Retrieval and Step-Captioning
Abhaysinh Zala
Jaemin Cho
Satwik Kottur
Xilun Chen
Barlas Ouguz
Yasher Mehdad
Joey Tianyi Zhou
3DV
87
54
0
29 Mar 2023
CelebV-Text: A Large-Scale Facial Text-Video Dataset
Jianhui Yu
Hao Zhu
Liming Jiang
Chen Change Loy
Weidong (Tom) Cai
Wayne Wu
74
62
0
26 Mar 2023
PDPP: Projected Diffusion for Procedure Planning in Instructional Videos
Hanlin Wang
Yilu Wu
Sheng Guo
Limin Wang
VGen
DiffM
169
31
0
26 Mar 2023
Learning and Verification of Task Structure in Instructional Videos
Medhini Narasimhan
Licheng Yu
Sean Bell
Ning Zhang
Trevor Darrell
119
19
0
23 Mar 2023
Unsupervised Task Graph Generation from Instructional Video Transcripts
Lajanugen Logeswaran
Sungryull Sohn
Y. Jang
Moontae Lee
Ho Hin Lee
55
8
0
17 Feb 2023
Multimodal Subtask Graph Generation from Instructional Videos
Y. Jang
Sungryull Sohn
Lajanugen Logeswaran
Tiange Luo
Moontae Lee
Ho Hin Lee
72
10
0
17 Feb 2023
LaMPP: Language Models as Probabilistic Priors for Perception and Action
Belinda Z. Li
William Chen
Pratyusha Sharma
Jacob Andreas
50
15
0
03 Feb 2023
Variational Cross-Graph Reasoning and Adaptive Structured Semantics Learning for Compositional Temporal Grounding
Juncheng Li
Siliang Tang
Linchao Zhu
Wenqiao Zhang
Yi Yang
Tat-Seng Chua
Fei Wu
Yueting Zhuang
BDL
81
17
0
22 Jan 2023
Action Dynamics Task Graphs for Learning Plannable Representations of Procedural Tasks
Weichao Mao
Ruta Desai
Michael L. Iuzzolino
Nitin Kamra
79
5
0
11 Jan 2023
HierVL: Learning Hierarchical Video-Language Embeddings
Kumar Ashutosh
Rohit Girdhar
Lorenzo Torresani
Kristen Grauman
VLM
AI4TS
111
59
0
05 Jan 2023
What You Say Is What You Show: Visual Narration Detection in Instructional Videos
Kumar Ashutosh
Rohit Girdhar
Lorenzo Torresani
Kristen Grauman
105
4
0
05 Jan 2023
Multi-queue Momentum Contrast for Microvideo-Product Retrieval
Yali Du
Yin-wei Wei
Wei Ji
Fan Liu
Xin Luo
Liqiang Nie
89
16
0
22 Dec 2022
Multi-Task Learning of Object State Changes from Uncurated Videos
Tomávs Souvcek
Jean-Baptiste Alayrac
Antoine Miech
Ivan Laptev
Josef Sivic
93
11
0
24 Nov 2022
Previous
1
2
3
4
Next