ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1903.08225
  4. Cited By
Cross-task weakly supervised learning from instructional videos
v1v2 (latest)

Cross-task weakly supervised learning from instructional videos

19 March 2019
Dimitri Zhukov
Jean-Baptiste Alayrac
R. G. Cinbis
David Fouhey
Ivan Laptev
Josef Sivic
    SSL
ArXiv (abs)PDFHTML

Papers citing "Cross-task weakly supervised learning from instructional videos"

24 / 174 papers shown
Title
Look Before you Speak: Visually Contextualized Utterances
Look Before you Speak: Visually Contextualized Utterances
Paul Hongsuck Seo
Arsha Nagrani
Cordelia Schmid
99
67
0
10 Dec 2020
QuerYD: A video dataset with high-quality text and audio narrations
QuerYD: A video dataset with high-quality text and audio narrations
Andreea-Maria Oncescu
João F. Henriques
Yang Liu
Andrew Zisserman
Samuel Albanie
VGen
73
11
0
22 Nov 2020
Boundary-sensitive Pre-training for Temporal Localization in Videos
Boundary-sensitive Pre-training for Temporal Localization in Videos
Mengmeng Xu
Juan-Manuel Perez-Rua
Victor Escorcia
Brais Martínez
Xiatian Zhu
Li Zhang
Guohao Li
Tao Xiang
80
61
0
21 Nov 2020
Action Duration Prediction for Segment-Level Alignment of Weakly-Labeled
  Videos
Action Duration Prediction for Segment-Level Alignment of Weakly-Labeled Videos
Reza Ghoddoosian
S. Sayed
V. Athitsos
AI4TS
28
7
0
20 Nov 2020
ActBERT: Learning Global-Local Video-Text Representations
ActBERT: Learning Global-Local Video-Text Representations
Linchao Zhu
Yi Yang
ViT
134
423
0
14 Nov 2020
A Visuospatial Dataset for Naturalistic Verb Learning
A Visuospatial Dataset for Naturalistic Verb Learning
Dylan Ebert
Ellie Pavlick
24
7
0
28 Oct 2020
Representation learning from videos in-the-wild: An object-centric
  approach
Representation learning from videos in-the-wild: An object-centric approach
Rob Romijnders
Aravindh Mahendran
Michael Tschannen
Josip Djolonga
Marvin Ritter
N. Houlsby
Mario Lucic
OCLSSL
70
8
0
06 Oct 2020
Full-Body Awareness from Partial Observations
Full-Body Awareness from Partial Observations
C. Rockwell
David Fouhey
3DH
77
49
0
13 Aug 2020
Learning Video Representations from Textual Web Supervision
Learning Video Representations from Textual Web Supervision
Jonathan C. Stroud
Zhichao Lu
Chen Sun
Jia Deng
Rahul Sukthankar
Cordelia Schmid
David A. Ross
SSL
113
48
0
29 Jul 2020
AVLnet: Learning Audio-Visual Language Representations from
  Instructional Videos
AVLnet: Learning Audio-Visual Language Representations from Instructional Videos
Andrew Rouditchenko
Angie Boggust
David Harwath
Brian Chen
D. Joshi
...
Rogerio Feris
Brian Kingsbury
M. Picheny
Antonio Torralba
James R. Glass
SSL
88
142
0
16 Jun 2020
Understanding Human Hands in Contact at Internet Scale
Understanding Human Hands in Contact at Internet Scale
Dandan Shan
Jiaqi Geng
Michelle Shu
David Fouhey
119
326
0
11 Jun 2020
On Evaluating Weakly Supervised Action Segmentation Methods
On Evaluating Weakly Supervised Action Segmentation Methods
Yaser Souri
Alexander Richard
Luca Minciullo
Juergen Gall
47
7
0
19 May 2020
A Recipe for Creating Multimodal Aligned Datasets for Sequential Tasks
A Recipe for Creating Multimodal Aligned Datasets for Sequential Tasks
Angela S. Lin
Sudha Rao
Asli Celikyilmaz
E. Nouri
Chris Brockett
Debadeepta Dey
Bill Dolan
78
26
0
19 May 2020
Learning to Segment Actions from Observation and Narration
Learning to Segment Actions from Observation and Narration
Daniel Fried
Jean-Baptiste Alayrac
Phil Blunsom
Chris Dyer
S. Clark
Aida Nematzadeh
122
32
0
07 May 2020
A Benchmark for Structured Procedural Knowledge Extraction from Cooking
  Videos
A Benchmark for Structured Procedural Knowledge Extraction from Cooking Videos
Frank F. Xu
Lei Ji
Botian Shi
Junyi Du
Graham Neubig
Yonatan Bisk
Nan Duan
41
21
0
02 May 2020
Beyond Instructional Videos: Probing for More Diverse Visual-Textual
  Grounding on YouTube
Beyond Instructional Videos: Probing for More Diverse Visual-Textual Grounding on YouTube
Jack Hessel
Zhenhai Zhu
Bo Pang
Radu Soricut
32
4
0
29 Apr 2020
Comprehensive Instructional Video Analysis: The COIN Dataset and
  Performance Evaluation
Comprehensive Instructional Video Analysis: The COIN Dataset and Performance Evaluation
Yansong Tang
Jiwen Lu
Jie Zhou
77
33
0
20 Mar 2020
UniVL: A Unified Video and Language Pre-Training Model for Multimodal
  Understanding and Generation
UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation
Huaishao Luo
Lei Ji
Botian Shi
Haoyang Huang
Nan Duan
Tianrui Li
Jason Li
Xilin Chen
Ming Zhou
VLM
124
438
0
15 Feb 2020
Action Modifiers: Learning from Adverbs in Instructional Videos
Action Modifiers: Learning from Adverbs in Instructional Videos
Hazel Doughty
Ivan Laptev
W. Mayol-Cuevas
Dima Damen
103
30
0
13 Dec 2019
End-to-End Learning of Visual Representations from Uncurated
  Instructional Videos
End-to-End Learning of Visual Representations from Uncurated Instructional Videos
Antoine Miech
Jean-Baptiste Alayrac
Lucas Smaira
Ivan Laptev
Josef Sivic
Andrew Zisserman
VGenSSL
154
713
0
13 Dec 2019
ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday
  Tasks
ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks
Mohit Shridhar
Jesse Thomason
Daniel Gordon
Yonatan Bisk
Winson Han
Roozbeh Mottaghi
Luke Zettlemoyer
Dieter Fox
LM&Ro
127
784
0
03 Dec 2019
Use What You Have: Video Retrieval Using Representations From
  Collaborative Experts
Use What You Have: Video Retrieval Using Representations From Collaborative Experts
Yang Liu
Samuel Albanie
Arsha Nagrani
Andrew Zisserman
91
391
0
31 Jul 2019
Procedure Planning in Instructional Videos
Procedure Planning in Instructional Videos
C. Chang
De-An Huang
Danfei Xu
Ehsan Adeli
Li Fei-Fei
Juan Carlos Niebles
91
103
0
02 Jul 2019
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million
  Narrated Video Clips
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips
Antoine Miech
Dimitri Zhukov
Jean-Baptiste Alayrac
Makarand Tapaswi
Ivan Laptev
Josef Sivic
VGen
130
1,211
0
07 Jun 2019
Previous
1234