Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1903.08225
Cited By
v1
v2 (latest)
Cross-task weakly supervised learning from instructional videos
19 March 2019
Dimitri Zhukov
Jean-Baptiste Alayrac
R. G. Cinbis
David Fouhey
Ivan Laptev
Josef Sivic
SSL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Cross-task weakly supervised learning from instructional videos"
50 / 174 papers shown
Title
Video Unsupervised Domain Adaptation with Deep Learning: A Comprehensive Survey
Yuecong Xu
Haozhi Cao
Zhenghua Chen
Xiaoli Li
Lihua Xie
Jianfei Yang
80
15
0
17 Nov 2022
Weakly-Supervised Temporal Article Grounding
Long Chen
Yulei Niu
Brian Chen
Xudong Lin
G. Han
Christopher Thomas
Hammad A. Ayyubi
Heng Ji
Shih-Fu Chang
AI4TS
86
13
0
22 Oct 2022
Temporal Action Segmentation: An Analysis of Modern Techniques
Guodong Ding
Fadime Sener
Angela Yao
188
80
0
19 Oct 2022
Robust Action Segmentation from Timestamp Supervision
Yaser Souri
Yazan Abu Farha
Emad Bahrami
Gianpiero Francesca
Juergen Gall
39
6
0
12 Oct 2022
Graph2Vid: Flow graph to Video Grounding for Weakly-supervised Multi-Step Localization
Nikita Dvornik
Isma Hadji
Hai X. Pham
Dhaivat Bhatt
Brais Martínez
Afsaneh Fazly
Allan D. Jepson
104
6
0
10 Oct 2022
A Closer Look at Temporal Ordering in the Segmentation of Instructional Videos
Anil Batra
Shreyank N. Gowda
Frank Keller
Laura Sevilla-Lara
84
5
0
30 Sep 2022
TVLT: Textless Vision-Language Transformer
Zineng Tang
Jaemin Cho
Yixin Nie
Joey Tianyi Zhou
VLM
135
31
0
28 Sep 2022
TL;DW? Summarizing Instructional Videos with Task Relevance & Cross-Modal Saliency
Medhini Narasimhan
Arsha Nagrani
Chen Sun
Michael Rubinstein
Trevor Darrell
Anna Rohrbach
Cordelia Schmid
83
35
0
14 Aug 2022
Video Question Answering with Iterative Video-Text Co-Tokenization
A. Piergiovanni
K. Morton
Weicheng Kuo
Michael S. Ryoo
A. Angelova
99
18
0
01 Aug 2022
LocVTP: Video-Text Pre-training for Temporal Localization
Meng Cao
Tianyu Yang
Junwu Weng
Can Zhang
Jue Wang
Yuexian Zou
90
65
0
21 Jul 2022
Disentangled Action Recognition with Knowledge Bases
Zhekun Luo
Shalini Ghosh
Devin Guillory
Keizo Kato
Trevor Darrell
Huijuan Xu
73
7
0
04 Jul 2022
Self-Supervised Learning for Videos: A Survey
Madeline Chantry Schiappa
Yogesh S Rawat
M. Shah
SSL
128
136
0
18 Jun 2022
P3IV: Probabilistic Procedure Planning from Instructional Videos with Weak Supervision
Henghui Zhao
Isma Hadji
Nikita Dvornik
Konstantinos G. Derpanis
Richard P. Wildes
Allan D. Jepson
91
46
0
04 May 2022
A Multi-level Alignment Training Scheme for Video-and-Language Grounding
Yubo Zhang
Feiyang Niu
Q. Ping
Govind Thattai
CVBM
85
2
0
22 Apr 2022
Temporal Alignment Networks for Long-term Video
Tengda Han
Weidi Xie
Andrew Zisserman
AI4TS
95
88
0
06 Apr 2022
Modeling Motion with Multi-Modal Features for Text-Based Video Segmentation
Wangbo Zhao
Kai Wang
Xiangxiang Chu
Fuzhao Xue
Xinchao Wang
Yang You
99
22
0
06 Apr 2022
Video-Text Representation Learning via Differentiable Weak Temporal Alignment
Dohwan Ko
Joonmyung Choi
Juyeon Ko
Shinyeong Noh
Kyoung-Woon On
Eun-Sol Kim
Hyunwoo J. Kim
VGen
AI4TS
74
22
0
31 Mar 2022
Text-Driven Video Acceleration: A Weakly-Supervised Reinforcement Learning Method
W. Ramos
M. Silva
Edson R. Araujo
Victor Moura
Keller Clayderman Martins de Oliveira
Leandro Soriano Marcolino
Erickson R. Nascimento
VGen
57
3
0
29 Mar 2022
Assembly101: A Large-Scale Multi-View Video Dataset for Understanding Procedural Activities
Fadime Sener
Dibyadip Chatterjee
Daniel Shelepov
Kun He
Dipika Singhania
Robert Y. Wang
Angela Yao
VGen
100
221
0
28 Mar 2022
Bridge-Prompt: Towards Ordinal Action Understanding in Instructional Videos
Muheng Li
Lei Chen
Yueqi Duan
Zhilan Hu
Jianjiang Feng
Jie Zhou
Jiwen Lu
79
76
0
26 Mar 2022
Compositional Temporal Grounding with Structured Variational Cross-Graph Correspondence Learning
Juncheng Li
Junlin Xie
Long Qian
Linchao Zhu
Siliang Tang
Leilei Gan
Yi Yang
Yueting Zhuang
Xinze Wang
95
75
0
24 Mar 2022
How Do You Do It? Fine-Grained Action Understanding with Pseudo-Adverbs
Hazel Doughty
Cees G. M. Snoek
133
19
0
23 Mar 2022
Look for the Change: Learning Object States and State-Modifying Actions from Untrimmed Web Videos
Tomávs Souvcek
Jean-Baptiste Alayrac
Antoine Miech
Ivan Laptev
Josef Sivic
99
33
0
22 Mar 2022
Towards Visual-Prompt Temporal Answering Grounding in Medical Instructional Video
Bin Li
Yixuan Weng
Bin Sun
Shutao Li
135
33
0
13 Mar 2022
When Did It Happen? Duration-informed Temporal Localization of Narrated Actions in Vlogs
Oana Ignat
Santiago Castro
Yuhang Zhou
Jiajun Bao
Dandan Shan
Rada Mihalcea
46
3
0
16 Feb 2022
Learning To Recognize Procedural Activities with Distant Supervision
Xudong Lin
Fabio Petroni
Gedas Bertasius
Marcus Rohrbach
Shih-Fu Chang
Lorenzo Torresani
105
87
0
26 Jan 2022
Boundary-aware Self-supervised Learning for Video Scene Segmentation
Jonghwan Mun
Minchul Shin
Gunsoo Han
Sangho Lee
S. Ha
Joonseok Lee
Eun-Sol Kim
SSL
98
20
0
14 Jan 2022
Bridging Video-text Retrieval with Multiple Choice Questions
Yuying Ge
Yixiao Ge
Xihui Liu
Dian Li
Ying Shan
Xiaohu Qie
Ping Luo
BDL
90
109
0
13 Jan 2022
Low-Rank Constraints for Fast Inference in Structured Models
Justin T. Chiu
Yuntian Deng
Alexander M. Rush
BDL
78
14
0
08 Jan 2022
SVIP: Sequence VerIfication for Procedures in Videos
Yichen Qian
Weixin Luo
Dongze Lian
Xu Tang
P. Zhao
Shenghua Gao
ViT
105
18
0
13 Dec 2021
Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval
Nina Shvetsova
Brian Chen
Andrew Rouditchenko
Samuel Thomas
Brian Kingsbury
Rogerio Feris
David Harwath
James R. Glass
Hilde Kuehne
ViT
129
134
0
08 Dec 2021
Routing with Self-Attention for Multimodal Capsule Networks
Kevin Duarte
Brian Chen
Nina Shvetsova
Andrew Rouditchenko
Samuel Thomas
Alexander H. Liu
David Harwath
James R. Glass
Hilde Kuehne
M. Shah
SSL
57
5
0
01 Dec 2021
Object-Region Video Transformers
Roei Herzig
Elad Ben-Avraham
K. Mangalam
Amir Bar
Gal Chechik
Anna Rohrbach
Trevor Darrell
Amir Globerson
ViT
86
84
0
13 Oct 2021
Hierarchical Modeling for Task Recognition and Action Segmentation in Weakly-Labeled Instructional Videos
Reza Ghoddoosian
S. Sayed
V. Athitsos
76
15
0
12 Oct 2021
Procedure Planning in Instructional Videos via Contextual Modeling and Model-based Policy Learning
Jing Bi
Jiebo Luo
Chenliang Xu
124
49
0
05 Oct 2021
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding
Hu Xu
Gargi Ghosh
Po-Yao (Bernie) Huang
Dmytro Okhonko
Armen Aghajanyan
Florian Metze
Luke Zettlemoyer
Florian Metze Luke Zettlemoyer Christoph Feichtenhofer
CLIP
VLM
315
582
0
28 Sep 2021
Survey: Transformer based Video-Language Pre-training
Ludan Ruan
Qin Jin
VLM
ViT
125
45
0
21 Sep 2021
PlaTe: Visually-Grounded Planning with Transformers in Procedural Tasks
Jiankai Sun
De-An Huang
Bo Lu
Yunhui Liu
Bolei Zhou
Animesh Garg
63
56
0
10 Sep 2021
Reconstructing and grounding narrated instructional videos in 3D
Dimitri Zhukov
Ignacio Rocco
Ivan Laptev
Josef Sivic
Johannes L. Schnberger
Bugra Tekin
Marc Pollefeys
23
0
0
09 Sep 2021
Drop-DTW: Aligning Common Signal Between Sequences While Dropping Outliers
Nikita Dvornik
Isma Hadji
Konstantinos G. Derpanis
Animesh Garg
Allan D. Jepson
51
51
0
26 Aug 2021
TACo: Token-aware Cascade Contrastive Learning for Video-Text Alignment
Jianwei Yang
Yonatan Bisk
Jianfeng Gao
113
140
0
23 Aug 2021
PixelSynth: Generating a 3D-Consistent Experience from a Single Image
C. Rockwell
David Fouhey
Justin Johnson
VGen
146
85
0
12 Aug 2021
Unsupervised Discovery of Actions in Instructional Videos
A. Piergiovanni
A. Angelova
Michael S. Ryoo
Irfan Essa
36
3
0
28 Jun 2021
Discriminative Triad Matching and Reconstruction for Weakly Referring Expression Grounding
Mingjie Sun
Jimin Xiao
Eng Gee Lim
Si Liu
John Y. Goulermas
ObjD
82
162
0
08 Jun 2021
Transferring Knowledge from Text to Video: Zero-Shot Anticipation for Procedural Actions
Fadime Sener
Rishabh Saraf
Angela Yao
LM&Ro
59
12
0
06 Jun 2021
VLM: Task-agnostic Video-Language Model Pre-training for Video Understanding
Hu Xu
Gargi Ghosh
Po-Yao (Bernie) Huang
Prahal Arora
Masoumeh Aminzadeh
Christoph Feichtenhofer
Florian Metze
Luke Zettlemoyer
80
133
0
20 May 2021
Multimodal Clustering Networks for Self-supervised Learning from Unlabeled Videos
Brian Chen
Andrew Rouditchenko
Kevin Duarte
Hilde Kuehne
Samuel Thomas
...
Rogerio Feris
David Harwath
James R. Glass
M. Picheny
Shih-Fu Chang
SSL
83
92
0
26 Apr 2021
CUPID: Adaptive Curation of Pre-training Data for Video-and-Language Representation Learning
Luowei Zhou
Jingjing Liu
Yu Cheng
Zhe Gan
Lei Zhang
52
7
0
01 Apr 2021
Perspectives and Prospects on Transformer Architecture for Cross-Modal Tasks with Language and Vision
Andrew Shin
Masato Ishii
T. Narihira
133
39
0
06 Mar 2021
Learning Temporal Dynamics from Cycles in Narrated Video
Dave Epstein
Jiajun Wu
Cordelia Schmid
Chen Sun
AI4TS
104
14
0
07 Jan 2021
Previous
1
2
3
4
Next