ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2201.10990
  4. Cited By
Learning To Recognize Procedural Activities with Distant Supervision

Learning To Recognize Procedural Activities with Distant Supervision

26 January 2022
Xudong Lin
Fabio Petroni
Gedas Bertasius
Marcus Rohrbach
Shih-Fu Chang
Lorenzo Torresani
ArXivPDFHTML

Papers citing "Learning To Recognize Procedural Activities with Distant Supervision"

50 / 68 papers shown
Title
Memory-efficient Streaming VideoLLMs for Real-time Procedural Video Understanding
Memory-efficient Streaming VideoLLMs for Real-time Procedural Video Understanding
Dibyadip Chatterjee
Edoardo Remelli
Yale Song
Bugra Tekin
Abhay Mittal
...
Shreyas Hampali
Eric Sauser
Shugao Ma
Angela Yao
Fadime Sener
VLM
46
0
0
10 Apr 2025
REEF: Relevance-Aware and Efficient LLM Adapter for Video Understanding
REEF: Relevance-Aware and Efficient LLM Adapter for Video Understanding
Sakib Reza
Xiyun Song
Heather Yu
Zongfang Lin
Mohsen Moghaddam
Octavia Camps
29
0
0
07 Apr 2025
Safety Modulation: Enhancing Safety in Reinforcement Learning through Cost-Modulated Rewards
Safety Modulation: Enhancing Safety in Reinforcement Learning through Cost-Modulated Rewards
Hanping Zhang
Yuhong Guo
OffRL
38
0
0
03 Apr 2025
What Changed and What Could Have Changed? State-Change Counterfactuals for Procedure-Aware Video Representation Learning
What Changed and What Could Have Changed? State-Change Counterfactuals for Procedure-Aware Video Representation Learning
Chi-Hsi Kung
Frangil Ramirez
Juhyung Ha
Yi-Ting Chen
David J. Crandall
Yi-Hsuan Tsai
43
0
0
27 Mar 2025
Stitch-a-Recipe: Video Demonstration from Multistep Descriptions
Stitch-a-Recipe: Video Demonstration from Multistep Descriptions
Chi Hsuan Wu
Kumar Ashutosh
Kristen Grauman
DiffM
63
0
0
18 Mar 2025
Quantum EigenGame for excited state calculation
Quantum EigenGame for excited state calculation
David Quiroga
Jason Han
Anastasios Kyrillidis
53
1
0
17 Mar 2025
VideoMAP: Toward Scalable Mamba-based Video Autoregressive Pretraining
VideoMAP: Toward Scalable Mamba-based Video Autoregressive Pretraining
Yunze Liu
Peiran Wu
C. Liang
Junxiao Shen
Limin Wang
Li Yi
Mamba
53
0
0
16 Mar 2025
VLog: Video-Language Models by Generative Retrieval of Narration Vocabulary
Kevin Qinghong Lin
Mike Zheng Shou
VGen
153
1
0
12 Mar 2025
HierarQ: Task-Aware Hierarchical Q-Former for Enhanced Video Understanding
HierarQ: Task-Aware Hierarchical Q-Former for Enhanced Video Understanding
Shehreen Azad
Vibhav Vineet
Y. S. Rawat
VLM
131
1
0
11 Mar 2025
CLAD: Constrained Latent Action Diffusion for Vision-Language Procedure Planning
Lei Shi
Andreas Bulling
DiffM
52
1
0
09 Mar 2025
Leveraging Procedural Knowledge and Task Hierarchies for Efficient Instructional Video Pre-training
Leveraging Procedural Knowledge and Task Hierarchies for Efficient Instructional Video Pre-training
Karan Samel
Nitish Sontakke
Irfan Essa
46
0
0
24 Feb 2025
Differentiable Task Graph Learning: Procedural Activity Representation and Online Mistake Detection from Egocentric Videos
Differentiable Task Graph Learning: Procedural Activity Representation and Online Mistake Detection from Egocentric Videos
Luigi Seminara
G. Farinella
Antonino Furnari
64
7
0
10 Jan 2025
ACE: Action Concept Enhancement of Video-Language Models in Procedural
  Videos
ACE: Action Concept Enhancement of Video-Language Models in Procedural Videos
Reza Ghoddoosian
Nakul Agarwal
Isht Dwivedi
Behzad Darisuh
68
0
0
23 Nov 2024
Video Token Merging for Long-form Video Understanding
Video Token Merging for Long-form Video Understanding
Seon-Ho Lee
Jue Wang
Zhikang Zhang
D. Fan
Xinyu Li
40
5
0
31 Oct 2024
Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach
Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach
Mathilde Caron
Alireza Fathi
Cordelia Schmid
Ahmet Iscen
34
1
0
31 Oct 2024
VEDIT: Latent Prediction Architecture For Procedural Video
  Representation Learning
VEDIT: Latent Prediction Architecture For Procedural Video Representation Learning
Han Lin
Tushar Nagarajan
Nicolas Ballas
Mido Assran
Mojtaba Komeili
Mohit Bansal
Koustuv Sinha
AI4TS
52
3
0
04 Oct 2024
Vinoground: Scrutinizing LMMs over Dense Temporal Reasoning with Short
  Videos
Vinoground: Scrutinizing LMMs over Dense Temporal Reasoning with Short Videos
Jianrui Zhang
Mu Cai
Yong Jae Lee
32
6
0
03 Oct 2024
Learning to Localize Actions in Instructional Videos with LLM-Based
  Multi-Pathway Text-Video Alignment
Learning to Localize Actions in Instructional Videos with LLM-Based Multi-Pathway Text-Video Alignment
Yuxiao Chen
K. Li
Wentao Bao
Deep Patel
Yu Kong
Martin Renqiang Min
Dimitris N. Metaxas
DiffM
33
1
0
22 Sep 2024
Enhancing Long Video Understanding via Hierarchical Event-Based Memory
Enhancing Long Video Understanding via Hierarchical Event-Based Memory
Dingxin Cheng
Mingda Li
Jingyu Liu
Yongxin Guo
Bin Jiang
Qingbin Liu
Xi Chen
Bo Zhao
32
4
0
10 Sep 2024
VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths
  Vision Computation
VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation
Shiwei Wu
Joya Chen
Kevin Qinghong Lin
Qimeng Wang
Yan Gao
Qianli Xu
Tong Bill Xu
Yao Hu
Enhong Chen
Mike Zheng Shou
VLM
47
12
0
29 Aug 2024
ExpertAF: Expert Actionable Feedback from Video
ExpertAF: Expert Actionable Feedback from Video
Kumar Ashutosh
Tushar Nagarajan
Georgios Pavlakos
Kris M. Kitani
Kristen Grauman
VGen
44
2
0
01 Aug 2024
Open Vocabulary Multi-Label Video Classification
Open Vocabulary Multi-Label Video Classification
Rohit Gupta
Mamshad Nayeem Rizve
Jayakrishnan Unnikrishnan
Ashish Tawari
Son Tran
Mubarak Shah
Benjamin Z. Yao
Trishul M. Chilimbi
VLM
62
1
0
12 Jul 2024
MatchTime: Towards Automatic Soccer Game Commentary Generation
MatchTime: Towards Automatic Soccer Game Commentary Generation
Jiayuan Rao
Haoning Wu
Chang-rui Liu
Yanfeng Wang
Weidi Xie
38
7
0
26 Jun 2024
VideoLLM-online: Online Video Large Language Model for Streaming Video
VideoLLM-online: Online Video Large Language Model for Streaming Video
Joya Chen
Zhaoyang Lv
Shiwei Wu
Kevin Qinghong Lin
Chenan Song
Difei Gao
Jia-Wei Liu
Ziteng Gao
Dongxing Mao
Mike Zheng Shou
MLLM
MoMe
45
49
0
17 Jun 2024
Step Differences in Instructional Video
Step Differences in Instructional Video
Tushar Nagarajan
Lorenzo Torresani
VGen
32
5
0
24 Apr 2024
Mamba-360: Survey of State Space Models as Transformer Alternative for
  Long Sequence Modelling: Methods, Applications, and Challenges
Mamba-360: Survey of State Space Models as Transformer Alternative for Long Sequence Modelling: Methods, Applications, and Challenges
Badri N. Patro
Vijay Srinivas Agneeswaran
Mamba
46
38
0
24 Apr 2024
PD-Insighter: A Visual Analytics System to Monitor Daily Actions for
  Parkinson's Disease Treatment
PD-Insighter: A Visual Analytics System to Monitor Daily Actions for Parkinson's Disease Treatment
Jade Kandel
Chelsea Duppen
Qian Zhang
Howard Jiang
Angelos Angelopoulos
...
Pranav Wagh
D. Szafir
Henry Fuchs
Michael Lewek
Danielle Albers Szafir
9
3
0
16 Apr 2024
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video
  Understanding
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
Bo He
Hengduo Li
Young Kyun Jang
Menglin Jia
Xuefei Cao
Ashish Shah
Abhinav Shrivastava
Ser-Nam Lim
MLLM
81
88
0
08 Apr 2024
VideoMamba: State Space Model for Efficient Video Understanding
VideoMamba: State Space Model for Efficient Video Understanding
Kunchang Li
Xinhao Li
Yi Wang
Yinan He
Yali Wang
Limin Wang
Yu Qiao
Mamba
37
180
0
11 Mar 2024
SCHEMA: State CHangEs MAtter for Procedure Planning in Instructional
  Videos
SCHEMA: State CHangEs MAtter for Procedure Planning in Instructional Videos
Yulei Niu
Wenliang Guo
Long Chen
Xudong Lin
Shih-Fu Chang
47
9
0
03 Mar 2024
Mamba-ND: Selective State Space Modeling for Multi-Dimensional Data
Mamba-ND: Selective State Space Modeling for Multi-Dimensional Data
Shufan Li
Harkanwar Singh
Aditya Grover
Mamba
92
56
0
08 Feb 2024
Detours for Navigating Instructional Videos
Detours for Navigating Instructional Videos
Kumar Ashutosh
Zihui Xue
Tushar Nagarajan
Kristen Grauman
26
6
0
03 Jan 2024
CaptainCook4D: A dataset for understanding errors in procedural
  activities
CaptainCook4D: A dataset for understanding errors in procedural activities
Rohith Peddi
Shivvrat Arya
B. Challa
Likhitha Pallapothula
Akshay Vyas
...
Vasundhara Komaragiri
Eric D. Ragan
Nicholas Ruozzi
Yu Xiang
Vibhav Gogate
57
8
0
22 Dec 2023
A Strong Baseline for Temporal Video-Text Alignment
A Strong Baseline for Temporal Video-Text Alignment
Zeqian Li
Qirui Chen
Tengda Han
Ya-Qin Zhang
Yanfeng Wang
Weidi Xie
AI4TS
VGen
35
5
0
21 Dec 2023
Generating Illustrated Instructions
Generating Illustrated Instructions
Sachit Menon
Ishan Misra
Rohit Girdhar
DiffM
31
4
0
07 Dec 2023
Exo2EgoDVC: Dense Video Captioning of Egocentric Procedural Activities
  Using Web Instructional Videos
Exo2EgoDVC: Dense Video Captioning of Egocentric Procedural Activities Using Web Instructional Videos
Takehiko Ohkawa
Takuma Yagi
Taichi Nishimura
Ryosuke Furuta
Atsushi Hashimoto
Yoshitaka Ushiku
Yoichi Sato
EgoV
44
8
0
28 Nov 2023
Efficient Pre-training for Localized Instruction Generation of Videos
Efficient Pre-training for Localized Instruction Generation of Videos
Anil Batra
Davide Moltisanti
Laura Sevilla-Lara
Marcus Rohrbach
Frank Keller
27
0
0
27 Nov 2023
IndustReal: A Dataset for Procedure Step Recognition Handling Execution
  Errors in Egocentric Videos in an Industrial-Like Setting
IndustReal: A Dataset for Procedure Step Recognition Handling Execution Errors in Egocentric Videos in an Industrial-Like Setting
Tim J. Schoonbeek
Tim Houben
H. Onvlee
Peter H. N. de With
Fons van der Sommen
44
22
0
26 Oct 2023
MULTISCRIPT: Multimodal Script Learning for Supporting Open Domain
  Everyday Tasks
MULTISCRIPT: Multimodal Script Learning for Supporting Open Domain Everyday Tasks
Jingyuan Qi
Minqian Liu
Ying Shen
Zhiyang Xu
Lifu Huang
LRM
VGen
29
2
0
08 Oct 2023
HowToCaption: Prompting LLMs to Transform Video Annotations at Scale
HowToCaption: Prompting LLMs to Transform Video Annotations at Scale
Nina Shvetsova
Anna Kukleva
Xudong Hong
Christian Rupprecht
Bernt Schiele
Hilde Kuehne
40
25
0
07 Oct 2023
Masked Diffusion with Task-awareness for Procedure Planning in
  Instructional Videos
Masked Diffusion with Task-awareness for Procedure Planning in Instructional Videos
Fen Fang
Yun Liu
Ali Koksal
Qianli Xu
Joo-Hwee Lim
VGen
DiffM
26
5
0
14 Sep 2023
Every Mistake Counts in Assembly
Every Mistake Counts in Assembly
Guodong Ding
Fadime Sener
Shugao Ma
Angela Yao
32
12
0
31 Jul 2023
AntGPT: Can Large Language Models Help Long-term Action Anticipation
  from Videos?
AntGPT: Can Large Language Models Help Long-term Action Anticipation from Videos?
Qi Zhao
Shijie Wang
Ce Zhang
Changcheng Fu
Minh Quan Do
Nakul Agarwal
Kwonjoon Lee
Chen Sun
LM&Ro
51
49
0
31 Jul 2023
Video-Mined Task Graphs for Keystep Recognition in Instructional Videos
Video-Mined Task Graphs for Keystep Recognition in Instructional Videos
Kumar Ashutosh
Santhosh Kumar Ramakrishnan
Triantafyllos Afouras
Kristen Grauman
23
23
0
17 Jul 2023
Language-Guided Music Recommendation for Video via Prompt Analogies
Language-Guided Music Recommendation for Video via Prompt Analogies
Daniel McKee
Justin Salamon
Josef Sivic
Bryan C. Russell
VGen
28
26
0
15 Jun 2023
Learning Fine-grained View-Invariant Representations from Unpaired
  Ego-Exo Videos via Temporal Alignment
Learning Fine-grained View-Invariant Representations from Unpaired Ego-Exo Videos via Temporal Alignment
Zihui Xue
Kristen Grauman
EgoV
33
31
0
08 Jun 2023
Learning to Ground Instructional Articles in Videos through Narrations
Learning to Ground Instructional Articles in Videos through Narrations
E. Mavroudi
Triantafyllos Afouras
Lorenzo Torresani
DiffM
33
21
0
06 Jun 2023
Non-Sequential Graph Script Induction via Multimedia Grounding
Non-Sequential Graph Script Induction via Multimedia Grounding
Yu Zhou
Sha Li
Manling Li
Xudong Lin
Shih-Fu Chang
Mohit Bansal
Heng Ji
25
8
0
27 May 2023
How to Index Item IDs for Recommendation Foundation Models
How to Index Item IDs for Recommendation Foundation Models
Wenyue Hua
Shuyuan Xu
Yingqiang Ge
Yongfeng Zhang
24
103
0
11 May 2023
Verbs in Action: Improving verb understanding in video-language models
Verbs in Action: Improving verb understanding in video-language models
Liliane Momeni
Mathilde Caron
Arsha Nagrani
Andrew Zisserman
Cordelia Schmid
32
70
0
13 Apr 2023
12
Next