Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1804.02748
Cited By
Scaling Egocentric Vision: The EPIC-KITCHENS Dataset
8 April 2018
Dima Damen
Hazel Doughty
G. Farinella
Sanja Fidler
Antonino Furnari
Evangelos Kazakos
Davide Moltisanti
Jonathan Munro
Toby Perrett
Will Price
Michael Wray
EgoV
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Scaling Egocentric Vision: The EPIC-KITCHENS Dataset"
50 / 60 papers shown
Title
RAVEN: Query-Guided Representation Alignment for Question Answering over Audio, Video, Embedded Sensors, and Natural Language
Subrata Biswas
Mohammad Nur Hossain Khan
Bashima Islam
67
0
0
21 May 2025
EgoDex: Learning Dexterous Manipulation from Large-Scale Egocentric Video
Ryan Hoque
Peide Huang
David J. Yoon
Mouli Sivapurapu
Jian Zhang
125
0
0
16 May 2025
Hierarchical and Multimodal Data for Daily Activity Understanding
Ghazal Kaviani
Yavuz Yarici
Seulgi Kim
Mohit Prabhushankar
Ghassan AlRegib
Mashhour Solh
Ameya Patil
88
0
0
24 Apr 2025
How Do I Do That? Synthesizing 3D Hand Motion and Contacts for Everyday Interactions
Aditya Prakash
Benjamin Lundell
Dmitry Andreychuk
David Forsyth
Saurabh Gupta
H. Sawhney
108
2
0
16 Apr 2025
GR00T N1: An Open Foundation Model for Generalist Humanoid Robots
Nvidia
Johan Bjorck
Fernando Castañeda
Nikita Cherniadev
Xingye Da
...
Ao Zhang
Hao Zhang
Yizhou Zhao
Ruijie Zheng
Yuke Zhu
VLM
120
37
0
18 Mar 2025
ST-Think: How Multimodal Large Language Models Reason About 4D Worlds from Ego-Centric Videos
Peiran Wu
Yunze Liu
Chonghan Liu
Miao Liu
VGen
LRM
88
5
0
16 Mar 2025
VidBot: Learning Generalizable 3D Actions from In-the-Wild 2D Human Videos for Zero-Shot Robotic Manipulation
Hanzhi Chen
Boyang Sun
Anran Zhang
Marc Pollefeys
Stefan Leutenegger
LM&Ro
103
0
0
10 Mar 2025
TimeLoc: A Unified End-to-End Framework for Precise Timestamp Localization in Long Videos
Chen-Da Liu-Zhang
Lin Sui
Shuming Liu
Fangzhou Mu
Ziyi Wang
Bernard Ghanem
82
1
0
09 Mar 2025
GAT-Grasp: Gesture-Driven Affordance Transfer for Task-Aware Robotic Grasping
Ruixiang Wang
Huayi Zhou
Xinyue Yao
Guiliang Liu
Kui Jia
70
0
0
08 Mar 2025
Pre-training Auto-regressive Robotic Models with 4D Representations
Dantong Niu
Yuvan Sharma
Haoru Xue
Giscard Biamby
Junyi Zhang
Ziteng Ji
Trevor Darrell
Roei Herzig
117
1
0
18 Feb 2025
MS-Temba : Multi-Scale Temporal Mamba for Efficient Temporal Action Detection
Arkaprava Sinha
Monish Soundar Raj
Pu Wang
Ahmed Helmy
Srijan Das
Mamba
95
3
0
10 Jan 2025
Audio-Language Datasets of Scenes and Events: A Survey
Gijs Wijngaard
Elia Formisano
Michele Esposito
M. Dumontier
111
2
0
10 Jan 2025
Learning to Transfer Human Hand Skills for Robot Manipulations
Sangkwon Park
Seungho Lee
M. Choi
Jiye Lee
Jeonghwan Kim
Jisoo Kim
Hanbyul Joo
81
4
0
07 Jan 2025
Measuring Error Alignment for Decision-Making Systems
Binxia Xu
Antonis Bikakis
Daniel Onah
A. Vlachidis
Luke Dickens
69
0
0
03 Jan 2025
Vinci: A Real-time Embodied Smart Assistant based on Egocentric Vision-Language Model
Yuanmin Huang
Jilan Xu
Baoqi Pei
Yuping He
Guo Chen
...
Kunpeng Li
C. Yuan
Yidan Wang
Yu Qiao
L. Wang
114
5
0
31 Dec 2024
Interacted Object Grounding in Spatio-Temporal Human-Object Interactions
Xiaoyang Liu
Boran Wen
Xinpeng Liu
Zizheng Zhou
Hongwei Fan
Cewu Lu
Lizhuang Ma
Yulong Chen
Yongqian Li
106
2
0
27 Dec 2024
Interact with me: Joint Egocentric Forecasting of Intent to Interact, Attitude and Social Actions
Tongfei Bian
Yiming Ma
Mathieu Chollet
Victor Sanchez
T. Guha
EgoV
126
1
0
21 Dec 2024
Do Language Models Understand Time?
Xi Ding
Lei Wang
232
1
0
18 Dec 2024
RoboMIND: Benchmark on Multi-embodiment Intelligence Normative Data for Robot Manipulation
Kun Wu
Chengkai Hou
Jiaming Liu
Zhengping Che
Xiaozhu Ju
...
Zhenyu Wang
Pengju An
Siyuan Qian
Shanghang Zhang
Jian Tang
LM&Ro
171
19
0
18 Dec 2024
Situational Scene Graph for Structured Human-centric Situation Understanding
Chinthani Sugandhika
Chen Li
Deepu Rajan
Basura Fernando
394
1
0
30 Oct 2024
On-Robot Reinforcement Learning with Goal-Contrastive Rewards
Ondrej Biza
Thomas Weng
Lingfeng Sun
Karl Schmeckpeper
Tarik Kelestemur
Yecheng Jason Ma
Robert Platt
Jan-Willem van de Meent
Lawson L. S. Wong
OffRL
70
0
0
25 Oct 2024
MM-Ego: Towards Building Egocentric Multimodal LLMs for Video QA
Hanrong Ye
Haotian Zhang
Erik Daxberger
Lin Chen
Zongyu Lin
...
Haoxuan You
Dan Xu
Zhe Gan
Jiasen Lu
Yinfei Yang
EgoV
MLLM
107
12
0
09 Oct 2024
A Comprehensive Review of Few-shot Action Recognition
Yuyang Wanyan
Xiaoshan Yang
Weiming Dong
Changsheng Xu
VLM
120
3
0
20 Jul 2024
The Collection of a Human Robot Collaboration Dataset for Cooperative Assembly in Glovebox Environments
Shivansh Sharma
Mathew Huang
Sanat Nair
Alan Wen
Christina Petlowany
Juston Moore
Selma Wanna
Mitch Pryor
112
0
0
19 Jul 2024
RMem: Restricted Memory Banks Improve Video Object Segmentation
Junbao Zhou
Ziqi Pang
Yu-Xiong Wang
VOS
91
7
0
12 Jun 2024
A Survey on Vision-Language-Action Models for Embodied AI
Yueen Ma
Zixing Song
Yuzheng Zhuang
Jianye Hao
Irwin King
LM&Ro
204
49
0
23 May 2024
SADA: Semantic adversarial unsupervised domain adaptation for Temporal Action Localization
David Pujol-Perich
Albert Clapés
Sergio Escalera
75
0
0
20 Dec 2023
Spacewalk-18: A Benchmark for Multimodal and Long-form Procedural Video Understanding in Novel Domains
Rohan Myer Krishnan
Zitian Tang
Zhiqiu Yu
Chen Sun
94
1
0
30 Nov 2023
Audio-Visual Speaker Tracking: Progress, Challenges, and Future Directions
Jinzheng Zhao
Yong-mei Xu
Xinyuan Qian
Davide Berghi
Peipei Wu
Meng Cui
Jianyuan Sun
Philip J. B. Jackson
Wenwu Wang
BDL
81
7
0
23 Oct 2023
LASER: A Neuro-Symbolic Framework for Learning Spatial-Temporal Scene Graphs with Weak Supervision
Jiani Huang
Ziyang Li
Mayur Naik
Ser-Nam Lim
98
3
0
15 Apr 2023
Robotic Telekinesis: Learning a Robotic Hand Imitator by Watching Humans on Youtube
Aravind Sivakumar
Kenneth Shaw
Deepak Pathak
148
101
0
21 Feb 2022
Multi-Moments in Time: Learning and Interpreting Models for Multi-Action Video Understanding
Mathew Monfort
Bowen Pan
K. Ramakrishnan
A. Andonian
Barry A. McNamara
A. Lascelles
Quanfu Fan
Dan Gutfreund
Rogerio Feris
A. Oliva
VLM
64
68
0
01 Nov 2019
Next-Active-Object prediction from Egocentric Videos
Antonino Furnari
Sebastiano Battiato
Kristen Grauman
G. Farinella
EgoV
41
96
0
10 Apr 2019
Action Completion: A Temporal Model for Moment Detection
Farnoosh Heidarivincheh
Majid Mirmehdi
Dima Damen
38
19
0
17 May 2018
Charades-Ego: A Large-Scale Dataset of Paired Third and First Person Videos
Gunnar Sigurdsson
Abhinav Gupta
Cordelia Schmid
Ali Farhadi
Alahari Karteek
SLR
EgoV
46
163
0
25 Apr 2018
HACS: Human Action Clips and Segments Dataset for Recognition and Temporal Localization
Hang Zhao
Antonio Torralba
Lorenzo Torresani
Zhicheng Yan
VLM
AI4TS
48
29
0
26 Dec 2017
From Lifestyle Vlogs to Everyday Interactions
David Fouhey
Weicheng Kuo
Alexei A. Efros
Jitendra Malik
60
124
0
06 Dec 2017
Deep Imitation Learning for Complex Manipulation Tasks from Virtual Reality Teleoperation
Tianhao Zhang
Zoe McCarthy
Owen Jow
Dennis Lee
Xi Chen
Ken Goldberg
Pieter Abbeel
SSL
69
653
0
12 Oct 2017
The "something something" video database for learning and evaluating visual common sense
Raghav Goyal
Samira Ebrahimi Kahou
Vincent Michalski
Joanna Materzynska
S. Westphal
...
Moritz Mueller-Freitag
F. Hoppe
Christian Thurau
Ingo Bax
Roland Memisevic
VLM
79
1,516
0
13 Jun 2017
Towards Automatic Learning of Procedures from Web Instructional Videos
Luowei Zhou
Chenliang Xu
Jason J. Corso
EgoV
66
819
0
28 Mar 2017
Trespassing the Boundaries: Labeling Temporal Bounds for Object Interactions in Egocentric Video
Davide Moltisanti
Michael Wray
W. Mayol-Cuevas
Dima Damen
EgoV
44
31
0
27 Mar 2017
Combining Self-Supervised Learning and Imitation for Vision-Based Rope Manipulation
Ashvin Nair
Dian Chen
Pulkit Agrawal
Phillip Isola
Pieter Abbeel
Jitendra Malik
Sergey Levine
SSL
48
306
0
06 Mar 2017
Joint Discovery of Object States and Manipulation Actions
Jean-Baptiste Alayrac
Josef Sivic
Ivan Laptev
Simon Lacoste-Julien
53
79
0
09 Feb 2017
Speed/accuracy trade-offs for modern convolutional object detectors
Jonathan Huang
V. Rathod
Chen Sun
Menglong Zhu
Anoop Korattikara Balan
...
Ian S. Fischer
Z. Wojna
Yang Song
S. Guadarrama
Kevin Patrick Murphy
3DH
3DV
82
2,567
0
30 Nov 2016
Visual Dialog
Abhishek Das
Satwik Kottur
Khushi Gupta
Avi Singh
Deshraj Yadav
José M. F. Moura
Devi Parikh
Dhruv Batra
142
993
0
26 Nov 2016
YouTube-8M: A Large-Scale Video Classification Benchmark
Sami Abu-El-Haija
Nisarg Kothari
Joonseok Lee
Apostol Natsev
G. Toderici
Balakrishnan Varadarajan
Sudheendra Vijayanarasimhan
VLM
102
1,264
0
27 Sep 2016
Temporal Segment Networks: Towards Good Practices for Deep Action Recognition
Limin Wang
Yuanjun Xiong
Zhe Wang
Yu Qiao
Dahua Lin
Xiaoou Tang
Luc Van Gool
ViT
95
3,814
0
02 Aug 2016
Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding
Gunnar Sigurdsson
Gül Varol
Xinyu Wang
Ali Farhadi
Ivan Laptev
Abhinav Gupta
VGen
88
1,238
0
06 Apr 2016
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
1.6K
192,638
0
10 Dec 2015
MovieQA: Understanding Stories in Movies through Question-Answering
Makarand Tapaswi
Yukun Zhu
Rainer Stiefelhagen
Antonio Torralba
R. Urtasun
Sanja Fidler
98
736
0
09 Dec 2015
1
2
Next