Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2307.16368
Cited By
v1
v2
v3 (latest)
AntGPT: Can Large Language Models Help Long-term Action Anticipation from Videos?
31 July 2023
Qi Zhao
Shijie Wang
Ce Zhang
Changcheng Fu
Minh Quan Do
Nakul Agarwal
Kwonjoon Lee
Chen Sun
LM&Ro
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"AntGPT: Can Large Language Models Help Long-term Action Anticipation from Videos?"
35 / 35 papers shown
Title
Predicting Implicit Arguments in Procedural Video Instructions
Anil Batra
Laura Sevilla-Lara
Marcus Rohrbach
Frank Keller
49
0
0
27 May 2025
Reinforced MLLM: A Survey on RL-Based Reasoning in Multimodal Large Language Models
Guanghao Zhou
Panjia Qiu
Chong Chen
Jiadong Wang
Zheming Yang
Jian Xu
Minghui Qiu
OffRL
LRM
172
8
0
30 Apr 2025
Anticipate & Act : Integrating LLMs and Classical Planning for Efficient Task Execution in Household Environments
Raghav Arora
Shivam Singh
Karthik Swaminathan
Ahana Datta
Snehasis Banerjee
Brojeshwar Bhowmick
Krishna Murthy Jatavallabhula
Mohan Sridharan
M. Krishna
LLMAG
106
11
0
04 Feb 2025
Interact with me: Joint Egocentric Forecasting of Intent to Interact, Attitude and Social Actions
Tongfei Bian
Yiming Ma
Mathieu Chollet
Victor Sanchez
T. Guha
EgoV
145
1
0
21 Dec 2024
Comparing Apples to Oranges: LLM-powered Multimodal Intention Prediction in an Object Categorization Task
Hassan Ali
Philipp Allgeuer
Stefan Wermter
112
2
0
12 Apr 2024
Technical Report for Ego4D Long Term Action Anticipation Challenge 2023
Tatsuya Ishibashi
Kosuke Ono
Noriyuki Kugo
Yuji Sato
49
6
0
04 Jul 2023
VideoLLM: Modeling Video Sequence with Large Language Models
Guo Chen
Yin-Dong Zheng
Jiahao Wang
Jilan Xu
Yifei Huang
...
Yi Wang
Yali Wang
Yu Qiao
Tong Lu
Limin Wang
MLLM
126
84
0
22 May 2023
Learning and Verification of Task Structure in Instructional Videos
Medhini Narasimhan
Licheng Yu
Sean Bell
Ning Zhang
Trevor Darrell
102
19
0
23 Mar 2023
Rethinking Learning Approaches for Long-Term Action Anticipation
Megha Nawhal
Akash Abdu Jyothi
Greg Mori
AI4TS
49
28
0
20 Oct 2022
Text-Derived Knowledge Helps Vision: A Simple Cross-modal Distillation for Video-based Action Anticipation
Sayontan Ghosh
Tanvi Aggarwal
Minh Hoai
Niranjan Balasubramanian
VLM
58
4
0
12 Oct 2022
Video + CLIP Baseline for Ego4D Long-term Action Anticipation
Srijan Das
Michael S. Ryoo
VLM
CLIP
53
17
0
01 Jul 2022
P3IV: Probabilistic Procedure Planning from Instructional Videos with Weak Supervision
Henghui Zhao
Isma Hadji
Nikita Dvornik
Konstantinos G. Derpanis
Richard P. Wildes
Allan D. Jepson
78
46
0
04 May 2022
Do As I Can, Not As I Say: Grounding Language in Robotic Affordances
Michael Ahn
Anthony Brohan
Noah Brown
Yevgen Chebotar
Omar Cortes
...
Ted Xiao
Peng Xu
Sichun Xu
Mengyuan Yan
Andy Zeng
LM&Ro
192
1,984
0
04 Apr 2022
Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language
Andy Zeng
Maria Attarian
Brian Ichter
K. Choromanski
Adrian S. Wong
...
Michael S. Ryoo
Vikas Sindhwani
Johnny Lee
Vincent Vanhoucke
Peter R. Florence
ReLM
LRM
152
588
0
01 Apr 2022
Assembly101: A Large-Scale Multi-View Video Dataset for Understanding Procedural Activities
Fadime Sener
Dibyadip Chatterjee
Daniel Shelepov
Kun He
Dipika Singhania
Robert Y. Wang
Angela Yao
VGen
81
220
0
28 Mar 2022
Look for the Change: Learning Object States and State-Modifying Actions from Untrimmed Web Videos
Tomávs Souvcek
Jean-Baptiste Alayrac
Antoine Miech
Ivan Laptev
Josef Sivic
70
33
0
22 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
823
9,644
0
28 Jan 2022
Learning To Recognize Procedural Activities with Distant Supervision
Xudong Lin
Fabio Petroni
Gedas Bertasius
Marcus Rohrbach
Shih-Fu Chang
Lorenzo Torresani
82
87
0
26 Jan 2022
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Kristen Grauman
Andrew Westbury
Eugene Byrne
Zachary Chavis
Antonino Furnari
...
Mike Zheng Shou
Antonio Torralba
Lorenzo Torresani
Mingfei Yan
Jitendra Malik
EgoV
399
1,109
0
13 Oct 2021
Procedure Planning in Instructional Videos via Contextual Modeling and Model-based Policy Learning
Jing Bi
Jiebo Luo
Chenliang Xu
110
49
0
05 Oct 2021
PlaTe: Visually-Grounded Planning with Transformers in Procedural Tasks
Jiankai Sun
De-An Huang
Bo Lu
Yunhui Liu
Bolei Zhou
Animesh Garg
51
56
0
10 Sep 2021
Finetuned Language Models Are Zero-Shot Learners
Jason W. Wei
Maarten Bosma
Vincent Zhao
Kelvin Guu
Adams Wei Yu
Brian Lester
Nan Du
Andrew M. Dai
Quoc V. Le
ALM
UQCV
217
3,782
0
03 Sep 2021
LoRA: Low-Rank Adaptation of Large Language Models
J. E. Hu
Yelong Shen
Phillip Wallis
Zeyuan Allen-Zhu
Yuanzhi Li
Shean Wang
Lu Wang
Weizhu Chen
OffRL
AI4TS
AI4CE
ALM
AIMat
488
10,496
0
17 Jun 2021
Anticipative Video Transformer
Rohit Girdhar
Kristen Grauman
ViT
65
211
0
03 Jun 2021
Long-Term Anticipation of Activities with Cycle Consistency
Yazan Abu Farha
Qiuhong Ke
Bernt Schiele
Juergen Gall
AI4TS
68
44
0
02 Sep 2020
VectorNet: Encoding HD Maps and Agent Dynamics from Vectorized Representation
Jiyang Gao
Chen Sun
Hang Zhao
Yi Shen
Dragomir Anguelov
Congcong Li
Cordelia Schmid
128
814
0
08 May 2020
The EPIC-KITCHENS Dataset: Collection, Challenges and Baselines
Dima Damen
Hazel Doughty
G. Farinella
Sanja Fidler
Antonino Furnari
...
Davide Moltisanti
Jonathan Munro
Toby Perrett
Will Price
Michael Wray
EgoV
71
235
0
29 Apr 2020
EGO-TOPO: Environment Affordances from Egocentric Video
Tushar Nagarajan
Yanghao Li
Christoph Feichtenhofer
Kristen Grauman
EgoV
121
123
0
14 Jan 2020
Procedure Planning in Instructional Videos
C. Chang
De-An Huang
Danfei Xu
Ehsan Adeli
Li Fei-Fei
Juan Carlos Niebles
75
103
0
02 Jul 2019
VideoGraph: Recognizing Minutes-Long Human Activities in Videos
Noureldien Hussein
E. Gavves
A. Smeulders
140
77
0
13 May 2019
Cross-task weakly supervised learning from instructional videos
Dimitri Zhukov
Jean-Baptiste Alayrac
R. G. Cinbis
David Fouhey
Ivan Laptev
Josef Sivic
SSL
128
250
0
19 Mar 2019
SlowFast Networks for Video Recognition
Christoph Feichtenhofer
Haoqi Fan
Jitendra Malik
Kaiming He
169
3,282
0
10 Dec 2018
When will you do what? - Anticipating Temporal Occurrences of Activities
Yazan Abu Farha
Alexander Richard
Juergen Gall
68
191
0
03 Apr 2018
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
João Carreira
Andrew Zisserman
235
8,037
0
22 May 2017
Towards Automatic Learning of Procedures from Web Instructional Videos
Luowei Zhou
Chenliang Xu
Jason J. Corso
EgoV
75
830
0
28 Mar 2017
1