ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1903.08225
  4. Cited By
Cross-task weakly supervised learning from instructional videos
v1v2 (latest)

Cross-task weakly supervised learning from instructional videos

19 March 2019
Dimitri Zhukov
Jean-Baptiste Alayrac
R. G. Cinbis
David Fouhey
Ivan Laptev
Josef Sivic
    SSL
ArXiv (abs)PDFHTML

Papers citing "Cross-task weakly supervised learning from instructional videos"

50 / 174 papers shown
Title
Video Unsupervised Domain Adaptation with Deep Learning: A Comprehensive
  Survey
Video Unsupervised Domain Adaptation with Deep Learning: A Comprehensive Survey
Yuecong Xu
Haozhi Cao
Zhenghua Chen
Xiaoli Li
Lihua Xie
Jianfei Yang
80
15
0
17 Nov 2022
Weakly-Supervised Temporal Article Grounding
Weakly-Supervised Temporal Article Grounding
Long Chen
Yulei Niu
Brian Chen
Xudong Lin
G. Han
Christopher Thomas
Hammad A. Ayyubi
Heng Ji
Shih-Fu Chang
AI4TS
86
13
0
22 Oct 2022
Temporal Action Segmentation: An Analysis of Modern Techniques
Temporal Action Segmentation: An Analysis of Modern Techniques
Guodong Ding
Fadime Sener
Angela Yao
188
80
0
19 Oct 2022
Robust Action Segmentation from Timestamp Supervision
Robust Action Segmentation from Timestamp Supervision
Yaser Souri
Yazan Abu Farha
Emad Bahrami
Gianpiero Francesca
Juergen Gall
39
6
0
12 Oct 2022
Graph2Vid: Flow graph to Video Grounding for Weakly-supervised
  Multi-Step Localization
Graph2Vid: Flow graph to Video Grounding for Weakly-supervised Multi-Step Localization
Nikita Dvornik
Isma Hadji
Hai X. Pham
Dhaivat Bhatt
Brais Martínez
Afsaneh Fazly
Allan D. Jepson
104
6
0
10 Oct 2022
A Closer Look at Temporal Ordering in the Segmentation of Instructional
  Videos
A Closer Look at Temporal Ordering in the Segmentation of Instructional Videos
Anil Batra
Shreyank N. Gowda
Frank Keller
Laura Sevilla-Lara
84
5
0
30 Sep 2022
TVLT: Textless Vision-Language Transformer
TVLT: Textless Vision-Language Transformer
Zineng Tang
Jaemin Cho
Yixin Nie
Joey Tianyi Zhou
VLM
135
31
0
28 Sep 2022
TL;DW? Summarizing Instructional Videos with Task Relevance &
  Cross-Modal Saliency
TL;DW? Summarizing Instructional Videos with Task Relevance & Cross-Modal Saliency
Medhini Narasimhan
Arsha Nagrani
Chen Sun
Michael Rubinstein
Trevor Darrell
Anna Rohrbach
Cordelia Schmid
83
35
0
14 Aug 2022
Video Question Answering with Iterative Video-Text Co-Tokenization
Video Question Answering with Iterative Video-Text Co-Tokenization
A. Piergiovanni
K. Morton
Weicheng Kuo
Michael S. Ryoo
A. Angelova
99
18
0
01 Aug 2022
LocVTP: Video-Text Pre-training for Temporal Localization
LocVTP: Video-Text Pre-training for Temporal Localization
Meng Cao
Tianyu Yang
Junwu Weng
Can Zhang
Jue Wang
Yuexian Zou
90
65
0
21 Jul 2022
Disentangled Action Recognition with Knowledge Bases
Disentangled Action Recognition with Knowledge Bases
Zhekun Luo
Shalini Ghosh
Devin Guillory
Keizo Kato
Trevor Darrell
Huijuan Xu
73
7
0
04 Jul 2022
Self-Supervised Learning for Videos: A Survey
Self-Supervised Learning for Videos: A Survey
Madeline Chantry Schiappa
Yogesh S Rawat
M. Shah
SSL
128
136
0
18 Jun 2022
P3IV: Probabilistic Procedure Planning from Instructional Videos with
  Weak Supervision
P3IV: Probabilistic Procedure Planning from Instructional Videos with Weak Supervision
Henghui Zhao
Isma Hadji
Nikita Dvornik
Konstantinos G. Derpanis
Richard P. Wildes
Allan D. Jepson
91
46
0
04 May 2022
A Multi-level Alignment Training Scheme for Video-and-Language Grounding
A Multi-level Alignment Training Scheme for Video-and-Language Grounding
Yubo Zhang
Feiyang Niu
Q. Ping
Govind Thattai
CVBM
85
2
0
22 Apr 2022
Temporal Alignment Networks for Long-term Video
Temporal Alignment Networks for Long-term Video
Tengda Han
Weidi Xie
Andrew Zisserman
AI4TS
95
88
0
06 Apr 2022
Modeling Motion with Multi-Modal Features for Text-Based Video
  Segmentation
Modeling Motion with Multi-Modal Features for Text-Based Video Segmentation
Wangbo Zhao
Kai Wang
Xiangxiang Chu
Fuzhao Xue
Xinchao Wang
Yang You
99
22
0
06 Apr 2022
Video-Text Representation Learning via Differentiable Weak Temporal
  Alignment
Video-Text Representation Learning via Differentiable Weak Temporal Alignment
Dohwan Ko
Joonmyung Choi
Juyeon Ko
Shinyeong Noh
Kyoung-Woon On
Eun-Sol Kim
Hyunwoo J. Kim
VGenAI4TS
74
22
0
31 Mar 2022
Text-Driven Video Acceleration: A Weakly-Supervised Reinforcement
  Learning Method
Text-Driven Video Acceleration: A Weakly-Supervised Reinforcement Learning Method
W. Ramos
M. Silva
Edson R. Araujo
Victor Moura
Keller Clayderman Martins de Oliveira
Leandro Soriano Marcolino
Erickson R. Nascimento
VGen
57
3
0
29 Mar 2022
Assembly101: A Large-Scale Multi-View Video Dataset for Understanding
  Procedural Activities
Assembly101: A Large-Scale Multi-View Video Dataset for Understanding Procedural Activities
Fadime Sener
Dibyadip Chatterjee
Daniel Shelepov
Kun He
Dipika Singhania
Robert Y. Wang
Angela Yao
VGen
100
221
0
28 Mar 2022
Bridge-Prompt: Towards Ordinal Action Understanding in Instructional
  Videos
Bridge-Prompt: Towards Ordinal Action Understanding in Instructional Videos
Muheng Li
Lei Chen
Yueqi Duan
Zhilan Hu
Jianjiang Feng
Jie Zhou
Jiwen Lu
79
76
0
26 Mar 2022
Compositional Temporal Grounding with Structured Variational Cross-Graph
  Correspondence Learning
Compositional Temporal Grounding with Structured Variational Cross-Graph Correspondence Learning
Juncheng Li
Junlin Xie
Long Qian
Linchao Zhu
Siliang Tang
Leilei Gan
Yi Yang
Yueting Zhuang
Xinze Wang
95
75
0
24 Mar 2022
How Do You Do It? Fine-Grained Action Understanding with Pseudo-Adverbs
How Do You Do It? Fine-Grained Action Understanding with Pseudo-Adverbs
Hazel Doughty
Cees G. M. Snoek
133
19
0
23 Mar 2022
Look for the Change: Learning Object States and State-Modifying Actions
  from Untrimmed Web Videos
Look for the Change: Learning Object States and State-Modifying Actions from Untrimmed Web Videos
Tomávs Souvcek
Jean-Baptiste Alayrac
Antoine Miech
Ivan Laptev
Josef Sivic
99
33
0
22 Mar 2022
Towards Visual-Prompt Temporal Answering Grounding in Medical
  Instructional Video
Towards Visual-Prompt Temporal Answering Grounding in Medical Instructional Video
Bin Li
Yixuan Weng
Bin Sun
Shutao Li
135
33
0
13 Mar 2022
When Did It Happen? Duration-informed Temporal Localization of Narrated
  Actions in Vlogs
When Did It Happen? Duration-informed Temporal Localization of Narrated Actions in Vlogs
Oana Ignat
Santiago Castro
Yuhang Zhou
Jiajun Bao
Dandan Shan
Rada Mihalcea
46
3
0
16 Feb 2022
Learning To Recognize Procedural Activities with Distant Supervision
Learning To Recognize Procedural Activities with Distant Supervision
Xudong Lin
Fabio Petroni
Gedas Bertasius
Marcus Rohrbach
Shih-Fu Chang
Lorenzo Torresani
105
87
0
26 Jan 2022
Boundary-aware Self-supervised Learning for Video Scene Segmentation
Boundary-aware Self-supervised Learning for Video Scene Segmentation
Jonghwan Mun
Minchul Shin
Gunsoo Han
Sangho Lee
S. Ha
Joonseok Lee
Eun-Sol Kim
SSL
98
20
0
14 Jan 2022
Bridging Video-text Retrieval with Multiple Choice Questions
Bridging Video-text Retrieval with Multiple Choice Questions
Yuying Ge
Yixiao Ge
Xihui Liu
Dian Li
Ying Shan
Xiaohu Qie
Ping Luo
BDL
90
109
0
13 Jan 2022
Low-Rank Constraints for Fast Inference in Structured Models
Low-Rank Constraints for Fast Inference in Structured Models
Justin T. Chiu
Yuntian Deng
Alexander M. Rush
BDL
78
14
0
08 Jan 2022
SVIP: Sequence VerIfication for Procedures in Videos
SVIP: Sequence VerIfication for Procedures in Videos
Yichen Qian
Weixin Luo
Dongze Lian
Xu Tang
P. Zhao
Shenghua Gao
ViT
105
18
0
13 Dec 2021
Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval
Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval
Nina Shvetsova
Brian Chen
Andrew Rouditchenko
Samuel Thomas
Brian Kingsbury
Rogerio Feris
David Harwath
James R. Glass
Hilde Kuehne
ViT
129
134
0
08 Dec 2021
Routing with Self-Attention for Multimodal Capsule Networks
Routing with Self-Attention for Multimodal Capsule Networks
Kevin Duarte
Brian Chen
Nina Shvetsova
Andrew Rouditchenko
Samuel Thomas
Alexander H. Liu
David Harwath
James R. Glass
Hilde Kuehne
M. Shah
SSL
57
5
0
01 Dec 2021
Object-Region Video Transformers
Object-Region Video Transformers
Roei Herzig
Elad Ben-Avraham
K. Mangalam
Amir Bar
Gal Chechik
Anna Rohrbach
Trevor Darrell
Amir Globerson
ViT
86
84
0
13 Oct 2021
Hierarchical Modeling for Task Recognition and Action Segmentation in
  Weakly-Labeled Instructional Videos
Hierarchical Modeling for Task Recognition and Action Segmentation in Weakly-Labeled Instructional Videos
Reza Ghoddoosian
S. Sayed
V. Athitsos
76
15
0
12 Oct 2021
Procedure Planning in Instructional Videos via Contextual Modeling and
  Model-based Policy Learning
Procedure Planning in Instructional Videos via Contextual Modeling and Model-based Policy Learning
Jing Bi
Jiebo Luo
Chenliang Xu
124
49
0
05 Oct 2021
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text
  Understanding
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding
Hu Xu
Gargi Ghosh
Po-Yao (Bernie) Huang
Dmytro Okhonko
Armen Aghajanyan
Florian Metze
Luke Zettlemoyer
Florian Metze Luke Zettlemoyer Christoph Feichtenhofer
CLIPVLM
315
582
0
28 Sep 2021
Survey: Transformer based Video-Language Pre-training
Survey: Transformer based Video-Language Pre-training
Ludan Ruan
Qin Jin
VLMViT
125
45
0
21 Sep 2021
PlaTe: Visually-Grounded Planning with Transformers in Procedural Tasks
PlaTe: Visually-Grounded Planning with Transformers in Procedural Tasks
Jiankai Sun
De-An Huang
Bo Lu
Yunhui Liu
Bolei Zhou
Animesh Garg
63
56
0
10 Sep 2021
Reconstructing and grounding narrated instructional videos in 3D
Reconstructing and grounding narrated instructional videos in 3D
Dimitri Zhukov
Ignacio Rocco
Ivan Laptev
Josef Sivic
Johannes L. Schnberger
Bugra Tekin
Marc Pollefeys
23
0
0
09 Sep 2021
Drop-DTW: Aligning Common Signal Between Sequences While Dropping
  Outliers
Drop-DTW: Aligning Common Signal Between Sequences While Dropping Outliers
Nikita Dvornik
Isma Hadji
Konstantinos G. Derpanis
Animesh Garg
Allan D. Jepson
51
51
0
26 Aug 2021
TACo: Token-aware Cascade Contrastive Learning for Video-Text Alignment
TACo: Token-aware Cascade Contrastive Learning for Video-Text Alignment
Jianwei Yang
Yonatan Bisk
Jianfeng Gao
113
140
0
23 Aug 2021
PixelSynth: Generating a 3D-Consistent Experience from a Single Image
PixelSynth: Generating a 3D-Consistent Experience from a Single Image
C. Rockwell
David Fouhey
Justin Johnson
VGen
146
85
0
12 Aug 2021
Unsupervised Discovery of Actions in Instructional Videos
Unsupervised Discovery of Actions in Instructional Videos
A. Piergiovanni
A. Angelova
Michael S. Ryoo
Irfan Essa
36
3
0
28 Jun 2021
Discriminative Triad Matching and Reconstruction for Weakly Referring
  Expression Grounding
Discriminative Triad Matching and Reconstruction for Weakly Referring Expression Grounding
Mingjie Sun
Jimin Xiao
Eng Gee Lim
Si Liu
John Y. Goulermas
ObjD
82
162
0
08 Jun 2021
Transferring Knowledge from Text to Video: Zero-Shot Anticipation for
  Procedural Actions
Transferring Knowledge from Text to Video: Zero-Shot Anticipation for Procedural Actions
Fadime Sener
Rishabh Saraf
Angela Yao
LM&Ro
59
12
0
06 Jun 2021
VLM: Task-agnostic Video-Language Model Pre-training for Video
  Understanding
VLM: Task-agnostic Video-Language Model Pre-training for Video Understanding
Hu Xu
Gargi Ghosh
Po-Yao (Bernie) Huang
Prahal Arora
Masoumeh Aminzadeh
Christoph Feichtenhofer
Florian Metze
Luke Zettlemoyer
80
133
0
20 May 2021
Multimodal Clustering Networks for Self-supervised Learning from
  Unlabeled Videos
Multimodal Clustering Networks for Self-supervised Learning from Unlabeled Videos
Brian Chen
Andrew Rouditchenko
Kevin Duarte
Hilde Kuehne
Samuel Thomas
...
Rogerio Feris
David Harwath
James R. Glass
M. Picheny
Shih-Fu Chang
SSL
83
92
0
26 Apr 2021
CUPID: Adaptive Curation of Pre-training Data for Video-and-Language
  Representation Learning
CUPID: Adaptive Curation of Pre-training Data for Video-and-Language Representation Learning
Luowei Zhou
Jingjing Liu
Yu Cheng
Zhe Gan
Lei Zhang
52
7
0
01 Apr 2021
Perspectives and Prospects on Transformer Architecture for Cross-Modal
  Tasks with Language and Vision
Perspectives and Prospects on Transformer Architecture for Cross-Modal Tasks with Language and Vision
Andrew Shin
Masato Ishii
T. Narihira
133
39
0
06 Mar 2021
Learning Temporal Dynamics from Cycles in Narrated Video
Learning Temporal Dynamics from Cycles in Narrated Video
Dave Epstein
Jiajun Wu
Cordelia Schmid
Chen Sun
AI4TS
104
14
0
07 Jan 2021
Previous
1234
Next