ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1903.08225
  4. Cited By
Cross-task weakly supervised learning from instructional videos
v1v2 (latest)

Cross-task weakly supervised learning from instructional videos

19 March 2019
Dimitri Zhukov
Jean-Baptiste Alayrac
R. G. Cinbis
David Fouhey
Ivan Laptev
Josef Sivic
    SSL
ArXiv (abs)PDFHTML

Papers citing "Cross-task weakly supervised learning from instructional videos"

50 / 174 papers shown
Title
InstructionBench: An Instructional Video Understanding Benchmark
InstructionBench: An Instructional Video Understanding Benchmark
Haiwan Wei
Yitian Yuan
Xiaohan Lan
Wei Ke
Lin Ma
ELM
88
3
0
01 Jul 2025
Anomaly Detection and Generation with Diffusion Models: A Survey
Anomaly Detection and Generation with Diffusion Models: A Survey
Yang Liu
Jing Liu
Chengfang Li
Rui Xi
W. Li
Liang Cao
Jin Wang
L. Yang
Junsong Yuan
Wei Zhou
DiffMMedIm
59
0
0
11 Jun 2025
PhysLab: A Benchmark Dataset for Multi-Granularity Visual Parsing of Physics Experiments
PhysLab: A Benchmark Dataset for Multi-Granularity Visual Parsing of Physics Experiments
Minghao Zou
Qingtian Zeng
Yongping Miao
Shangkun Liu
Zilong Wang
Hantao Liu
Wei Zhou
14
0
0
07 Jun 2025
WorldPrediction: A Benchmark for High-level World Modeling and Long-horizon Procedural Planning
Delong Chen
Willy Chung
Yejin Bang
Ziwei Ji
Pascale Fung
VGenLM&Ro
62
0
0
04 Jun 2025
Predicting Implicit Arguments in Procedural Video Instructions
Predicting Implicit Arguments in Procedural Video Instructions
Anil Batra
Laura Sevilla-Lara
Marcus Rohrbach
Frank Keller
51
0
0
27 May 2025
$I^2G$: Generating Instructional Illustrations via Text-Conditioned Diffusion
I2GI^2GI2G: Generating Instructional Illustrations via Text-Conditioned Diffusion
Jing Bi
Pinxin Liu
Ali Vosoughi
Jiarui Wu
Jinxi He
Chenliang Xu
DiffM
48
0
0
22 May 2025
Scalable Video-to-Dataset Generation for Cross-Platform Mobile Agents
Scalable Video-to-Dataset Generation for Cross-Platform Mobile Agents
Yunseok Jang
Yeda Song
Sungryull Sohn
Lajanugen Logeswaran
Tiange Luo
Dong-Ki Kim
Kyunghoon Bae
Honglak Lee
VGen
62
0
0
19 May 2025
HiERO: understanding the hierarchy of human behavior enhances reasoning on egocentric videos
HiERO: understanding the hierarchy of human behavior enhances reasoning on egocentric videos
Simone Alberto Peirone
Francesca Pistilli
Giuseppe Averta
61
0
0
19 May 2025
Ask2Loc: Learning to Locate Instructional Visual Answers by Asking Questions
Ask2Loc: Learning to Locate Instructional Visual Answers by Asking Questions
Chang Zong
Bin Li
Shoujun Zhou
Jian Wan
Lei Zhang
464
0
0
22 Apr 2025
Memory-efficient Streaming VideoLLMs for Real-time Procedural Video Understanding
Memory-efficient Streaming VideoLLMs for Real-time Procedural Video Understanding
Dibyadip Chatterjee
Edoardo Remelli
Yale Song
Bugra Tekin
Abhay Mittal
...
Shreyas Hampali
Eric Sauser
Shugao Ma
Angela Yao
Fadime Sener
VLM
99
0
0
10 Apr 2025
Learning Activity View-invariance Under Extreme Viewpoint Changes via Curriculum Knowledge Distillation
Learning Activity View-invariance Under Extreme Viewpoint Changes via Curriculum Knowledge Distillation
Arjun Somayazulu
E. Mavroudi
Changan Chen
Lorenzo Torresani
Kristen Grauman
68
0
0
07 Apr 2025
Context-Enhanced Memory-Refined Transformer for Online Action Detection
Context-Enhanced Memory-Refined Transformer for Online Action Detection
Zhanzhong Pang
Fadime Sener
Angela Yao
OffRL
125
2
0
24 Mar 2025
Stitch-a-Recipe: Video Demonstration from Multistep Descriptions
Stitch-a-Recipe: Video Demonstration from Multistep Descriptions
Chi Hsuan Wu
Kumar Ashutosh
Kristen Grauman
DiffM
104
0
0
18 Mar 2025
Measure Twice, Cut Once: Grasping Video Structures and Event Semantics with LLMs for Video Temporal Localization
Zongshang Pang
Mayu Otani
Yuta Nakashima
128
0
0
12 Mar 2025
CLAD: Constrained Latent Action Diffusion for Vision-Language Procedure Planning
Lei Shi
Andreas Bulling
DiffM
97
2
0
09 Mar 2025
Task Graph Maximum Likelihood Estimation for Procedural Activity Understanding in Egocentric Videos
Task Graph Maximum Likelihood Estimation for Procedural Activity Understanding in Egocentric Videos
Luigi Seminara
G. Farinella
Antonino Furnari
126
0
0
25 Feb 2025
Leveraging Procedural Knowledge and Task Hierarchies for Efficient Instructional Video Pre-training
Leveraging Procedural Knowledge and Task Hierarchies for Efficient Instructional Video Pre-training
Karan Samel
Nitish Sontakke
Irfan Essa
76
1
0
24 Feb 2025
Learning Human Skill Generators at Key-Step Levels
Learning Human Skill Generators at Key-Step Levels
Yilu Wu
Chenhui Zhu
Shuai Wang
Hanlin Wang
Jing Wang
Zhaoxiang Zhang
Limin Wang
VGen
200
0
0
12 Feb 2025
TimeLogic: A Temporal Logic Benchmark for Video QA
TimeLogic: A Temporal Logic Benchmark for Video QA
S. Swetha
Hilde Kuehne
Mubarak Shah
57
1
0
13 Jan 2025
Audio-Language Datasets of Scenes and Events: A Survey
Audio-Language Datasets of Scenes and Events: A Survey
Gijs Wijngaard
Elia Formisano
Michele Esposito
M. Dumontier
174
3
0
10 Jan 2025
Differentiable Task Graph Learning: Procedural Activity Representation and Online Mistake Detection from Egocentric Videos
Differentiable Task Graph Learning: Procedural Activity Representation and Online Mistake Detection from Egocentric Videos
Luigi Seminara
G. Farinella
Antonino Furnari
129
9
0
10 Jan 2025
SUTrack: Towards Simple and Unified Single Object Tracking
SUTrack: Towards Simple and Unified Single Object Tracking
Xin Chen
Ben Kang
Wanting Geng
Jiawen Zhu
Yebin Liu
Dong Wang
Huchuan Lu
VOTViT
103
5
0
26 Dec 2024
VG-TVP: Multimodal Procedural Planning via Visually Grounded Text-Video
  Prompting
VG-TVP: Multimodal Procedural Planning via Visually Grounded Text-Video Prompting
Muhammet Furkan Ilaslan
Ali Koksal
Kevin Qinghong Lin
Burak Satar
Mike Zheng Shou
Qianli Xu
LM&Ro
115
0
0
16 Dec 2024
IKEA Manuals at Work: 4D Grounding of Assembly Instructions on Internet Videos
Yunong Liu
Cristobal Eyzaguirre
Manling Li
Shubh Khanna
Juan Carlos Niebles
Vineeth Ravi
Saumitra Mishra
Weiyu Liu
Jiajun Wu
119
1
0
18 Nov 2024
Which Viewpoint Shows it Best? Language for Weakly Supervising View Selection in Multi-view Instructional Videos
Which Viewpoint Shows it Best? Language for Weakly Supervising View Selection in Multi-view Instructional Videos
Sagnik Majumder
Tushar Nagarajan
Ziad Al-Halah
Reina Pradhan
Kristen Grauman
80
1
0
13 Nov 2024
TI-PREGO: Chain of Thought and In-Context Learning for Online Mistake
  Detection in PRocedural EGOcentric Videos
TI-PREGO: Chain of Thought and In-Context Learning for Online Mistake Detection in PRocedural EGOcentric Videos
Leonardo Plini
Luca Scofano
Edoardo De Matteis
Guido Maria DÁmely di Melendugno
Alessandro Flaborea
Andrea Sanchietti
G. Farinella
Fabio Galasso
Antonino Furnari
EgoVLRM
104
1
0
04 Nov 2024
Egocentric and Exocentric Methods: A Short Survey
Egocentric and Exocentric Methods: A Short Survey
Anirudh Thatipelli
Shao-Yuan Lo
Amit K. Roy-Chowdhury
EgoV
86
2
0
27 Oct 2024
Human Action Anticipation: A Survey
Human Action Anticipation: A Survey
Bolin Lai
Sam Toyer
Tushar Nagarajan
Rohit Girdhar
S. Zha
James M. Rehg
Kris Kitani
Kristen Grauman
Ruta Desai
Miao Liu
AI4TS
74
1
0
17 Oct 2024
EgoOops: A Dataset for Mistake Action Detection from Egocentric Videos Referring to Procedural Texts
EgoOops: A Dataset for Mistake Action Detection from Egocentric Videos Referring to Procedural Texts
Yuto Haneji
Taichi Nishimura
Hirotaka Kameko
Keisuke Shirai
Tomoya Yoshida
Keiya Kajimura
Koki Yamamoto
Taiyu Cui
Tomohiro Nishimoto
Shinsuke Mori
EgoV
84
2
0
07 Oct 2024
Optimising for the Unknown: Domain Alignment for Cephalometric Landmark
  Detection
Optimising for the Unknown: Domain Alignment for Cephalometric Landmark Detection
Julian Wyatt
Irina Voiculescu
43
0
0
06 Oct 2024
VEDIT: Latent Prediction Architecture For Procedural Video
  Representation Learning
VEDIT: Latent Prediction Architecture For Procedural Video Representation Learning
Han Lin
Tushar Nagarajan
Nicolas Ballas
Mido Assran
Mojtaba Komeili
Joey Tianyi Zhou
Koustuv Sinha
AI4TS
110
5
0
04 Oct 2024
Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in
  Instructional Videos
Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos
Md. Mohaiminul Islam
Tushar Nagarajan
Huiyu Wang
Fu-Jen Chu
Kris Kitani
Gedas Bertasius
Xitong Yang
72
4
0
30 Sep 2024
E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding
E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding
Ye Liu
Zongyang Ma
Zhongang Qi
Yang Wu
Ying Shan
Chang Wen Chen
107
23
0
26 Sep 2024
Causal Temporal Representation Learning with Nonstationary Sparse
  Transition
Causal Temporal Representation Learning with Nonstationary Sparse Transition
Xiangchen Song
Zijian Li
Guangyi Chen
Yujia Zheng
Yewen Fan
Xinshuai Dong
Kun Zhang
CML
52
2
0
05 Sep 2024
Box2Flow: Instance-based Action Flow Graphs from Videos
Box2Flow: Instance-based Action Flow Graphs from Videos
Jiatong Li
Kalliopi Basioti
Vladimir Pavlovic
118
0
0
30 Aug 2024
Diffusion Model for Planning: A Systematic Literature Review
Diffusion Model for Planning: A Systematic Literature Review
Toshihide Ubukata
Jialong Li
Kenji Tei
DiffMMedIm
140
9
0
16 Aug 2024
COM Kitchens: An Unedited Overhead-view Video Dataset as a
  Vision-Language Benchmark
COM Kitchens: An Unedited Overhead-view Video Dataset as a Vision-Language Benchmark
Koki Maeda
Tosho Hirasawa
Atsushi Hashimoto
Jun Harashima
Leszek Rybicki
Yusuke Fukasawa
Yoshitaka Ushiku
101
0
0
05 Aug 2024
User-in-the-loop Evaluation of Multimodal LLMs for Activity Assistance
User-in-the-loop Evaluation of Multimodal LLMs for Activity Assistance
Mrinal Verghese
Brian Chen
H. Eghbalzadeh
Tushar Nagarajan
Ruta Desai
LRM
80
1
0
04 Aug 2024
ExpertAF: Expert Actionable Feedback from Video
ExpertAF: Expert Actionable Feedback from Video
Kumar Ashutosh
Tushar Nagarajan
Georgios Pavlakos
Kris Kitani
Kristen Grauman
VGen
151
3
0
01 Aug 2024
Open-Event Procedure Planning in Instructional Videos
Open-Event Procedure Planning in Instructional Videos
Yilu Wu
Hanlin Wang
Jing Wang
Limin Wang
88
1
0
06 Jul 2024
GUIDE: A Guideline-Guided Dataset for Instructional Video Comprehension
GUIDE: A Guideline-Guided Dataset for Instructional Video Comprehension
Jiafeng Liang
Shixin Jiang
Zekun Wang
Haojie Pan
Zerui Chen
Zheng Chu
Ming Liu
Ruiji Fu
Zhongyuan Wang
Bing Qin
64
3
0
26 Jun 2024
A Survey of Video Datasets for Grounded Event Understanding
A Survey of Video Datasets for Grounded Event Understanding
Kate Sanders
Benjamin Van Durme
84
4
0
14 Jun 2024
Video-Language Understanding: A Survey from Model Architecture, Model
  Training, and Data Perspectives
Video-Language Understanding: A Survey from Model Architecture, Model Training, and Data Perspectives
Thong Nguyen
Yi Bin
Junbin Xiao
Leigang Qu
Yicong Li
Jay Zhangjie Wu
Cong-Duy Nguyen
See-Kiong Ng
Luu Anh Tuan
VLM
166
13
1
09 Jun 2024
Step Differences in Instructional Video
Step Differences in Instructional Video
Tushar Nagarajan
Lorenzo Torresani
VGen
101
5
0
24 Apr 2024
Sequential Compositional Generalization in Multimodal Models
Sequential Compositional Generalization in Multimodal Models
Semih Yagcioglu
Osman Batur .Ince
Aykut Erdem
Erkut Erdem
Desmond Elliott
Deniz Yuret
69
1
0
18 Apr 2024
PREGO: online mistake detection in PRocedural EGOcentric videos
PREGO: online mistake detection in PRocedural EGOcentric videos
Alessandro Flaborea
Guido Maria DÁmely di Melendugno
Leonardo Plini
Luca Scofano
Edoardo De Matteis
Antonino Furnari
G. Farinella
Yuta Kyuragi
EgoV
101
13
0
02 Apr 2024
X-MIC: Cross-Modal Instance Conditioning for Egocentric Action
  Generalization
X-MIC: Cross-Modal Instance Conditioning for Egocentric Action Generalization
Anna Kukleva
Fadime Sener
Edoardo Remelli
Bugra Tekin
Eric Sauser
Bernt Schiele
Shugao Ma
VLMEgoV
73
2
0
28 Mar 2024
Efficient and Effective Weakly-Supervised Action Segmentation via
  Action-Transition-Aware Boundary Alignment
Efficient and Effective Weakly-Supervised Action Segmentation via Action-Transition-Aware Boundary Alignment
Angchi Xu
Wei-Shi Zheng
83
5
0
28 Mar 2024
RAP: Retrieval-Augmented Planner for Adaptive Procedure Planning in
  Instructional Videos
RAP: Retrieval-Augmented Planner for Adaptive Procedure Planning in Instructional Videos
Ali Zare
Yulei Niu
Hammad A. Ayyubi
Shih-Fu Chang
82
1
0
27 Mar 2024
ActionDiffusion: An Action-aware Diffusion Model for Procedure Planning
  in Instructional Videos
ActionDiffusion: An Action-aware Diffusion Model for Procedure Planning in Instructional Videos
Lei Shi
Paul-Christian Bürkner
Andreas Bulling
DiffMVGen
78
4
0
13 Mar 2024
1234
Next