Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2110.07058
Cited By
Ego4D: Around the World in 3,000 Hours of Egocentric Video
13 October 2021
Kristen Grauman
Andrew Westbury
Eugene Byrne
Zachary Chavis
Antonino Furnari
Rohit Girdhar
Jackson Hamburger
Hao Jiang
Miao Liu
Xingyu Liu
Miguel Martin
Tushar Nagarajan
Ilija Radosavovic
Santhosh Kumar Ramakrishnan
Fiona Ryan
J. Sharma
Michael Wray
Mengmeng Xu
Eric Z. Xu
Chen Zhao
Siddhant Bansal
Dhruv Batra
Vincent Cartillier
Sean Crane
Tien Do
Morrie Doulaty
Akshay Erapalli
Christoph Feichtenhofer
A. Fragomeni
Qichen Fu
A. Gebreselasie
Cristina González
James M. Hillis
Xuhua Huang
Yifei Huang
Wenqi Jia
Weslie Khoo
J. Kolár
Satwik Kottur
Anurag Kumar
F. Landini
Chao Li
Yanghao Li
Zhenqiang Li
K. Mangalam
Raghava Modhugu
Jonathan Munro
Tullie Murrell
Takumi Nishiyasu
Will Price
Paola Ruiz Puentes
Merey Ramazanova
Leda Sari
Kiran Somasundaram
Audrey Southerland
Yusuke Sugano
Ruijie Tao
Minh Vo
Yuchen Wang
Xindi Wu
Takuma Yagi
Ziwei Zhao
Yunyi Zhu
Pablo Arbelaez
David J. Crandall
Dima Damen
G. Farinella
Christian Fuegen
Guohao Li
V. Ithapu
C. V. Jawahar
Hanbyul Joo
Kris M. Kitani
Haizhou Li
Richard Newcombe
A. Oliva
H. Park
James M. Rehg
Yoichi Sato
Jianbo Shi
Mike Zheng Shou
Antonio Torralba
Lorenzo Torresani
Mingfei Yan
Jitendra Malik
EgoV
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Ego4D: Around the World in 3,000 Hours of Egocentric Video"
50 / 786 papers shown
Title
Egocentric zone-aware action recognition across environments
Simone Alberto Peirone
Gabriele Goletto
M. Planamente
A. Bottino
Barbara Caputo
Giuseppe Averta
EgoV
31
0
0
21 Sep 2024
ReMEmbR: Building and Reasoning Over Long-Horizon Spatio-Temporal Memory for Robot Navigation
Abrar Anwar
John Welsh
Joydeep Biswas
Soha Pouya
Yan Chang
LM&Ro
31
9
0
20 Sep 2024
Should RAG Chatbots Forget Unimportant Conversations? Exploring Importance and Forgetting with Psychological Insights
Ryuichi Sumida
Koji Inoue
Tatsuya Kawahara
33
0
0
19 Sep 2024
Robust Audiovisual Speech Recognition Models with Mixture-of-Experts
Yihan Wu
Yifan Peng
Yichen Lu
Xuankai Chang
Ruihua Song
Shinji Watanabe
49
2
0
19 Sep 2024
Measuring Sound Symbolism in Audio-visual Models
Wei-Cheng Tseng
Yi-Jen Shih
David Harwath
Raymond Mooney
34
0
0
18 Sep 2024
WiLoR: End-to-end 3D Hand Localization and Reconstruction in-the-wild
Rolandos Alexandros Potamias
Jinglei Zhang
Jiankang Deng
S. Zafeiriou
3DH
36
10
0
18 Sep 2024
Mamba Fusion: Learning Actions Through Questioning
Zhikang Dong
Apoorva Beedu
Jason Sheinkopf
Irfan Essa
Mamba
70
2
0
17 Sep 2024
AMEGO: Active Memory from long EGOcentric videos
Gabriele Goletto
Tushar Nagarajan
Giuseppe Averta
Dima Damen
EgoV
33
4
0
17 Sep 2024
Pre-Training for 3D Hand Pose Estimation with Contrastive Learning on Large-Scale Hand Images in the Wild
Nie Lin
Takehiko Ohkawa
Mingfang Zhang
Yifei Huang
Ryosuke Furuta
Yoichi Sato
3DH
23
2
0
15 Sep 2024
Egocentric Speaker Classification in Child-Adult Dyadic Interactions: From Sensing to Computational Modeling
Tiantian Feng
Anfeng Xu
Xuan Shi
Somer Bishop
Shrikanth Narayanan
35
1
0
14 Sep 2024
Hand-Object Interaction Pretraining from Videos
Himanshu Gaurav Singh
Antonio Loquercio
Carmelo Sferrazza
Jane Wu
Haozhi Qi
Pieter Abbeel
Jitendra Malik
44
13
0
12 Sep 2024
Audio-Visual Speaker Diarization: Current Databases, Approaches and Challenges
Victoria Mingote
Alfonso Ortega
A. Miguel
Eduardo Lleida
30
0
0
09 Sep 2024
Towards Social AI: A Survey on Understanding Social Interactions
Sangmin Lee
Minzhi Li
Bolin Lai
Wenqi Jia
Fiona Ryan
...
Ozgur Kara
Bikram Boote
Weiyan Shi
Diyi Yang
James M. Rehg
39
4
0
05 Sep 2024
Multi-modal Situated Reasoning in 3D Scenes
Xiongkun Linghu
Jiangyong Huang
Xuesong Niu
Xiaojian Ma
Baoxiong Jia
Siyuan Huang
36
11
0
04 Sep 2024
Semantically Controllable Augmentations for Generalizable Robot Learning
Zoey Chen
Zhao Mandi
Homanga Bharadhwaj
Mohit Sharma
Shuran Song
Abhishek Gupta
Vikash Kumar
LM&Ro
34
5
0
02 Sep 2024
CogVLM2: Visual Language Models for Image and Video Understanding
Wenyi Hong
Weihan Wang
Ming Ding
Wenmeng Yu
Qingsong Lv
...
Debing Liu
Bin Xu
Juanzi Li
Yuxiao Dong
Jie Tang
VLM
MLLM
50
88
0
29 Aug 2024
Beyond Uncertainty: Evidential Deep Learning for Robust Video Temporal Grounding
Kaijing Ma
Haojian Huang
Jin Chen
Haodong Chen
Pengliang Ji
...
Han Fang
Chao Ban
Hao Sun
Mulin. Chen
Xuelong Li
37
7
0
29 Aug 2024
GR-MG: Leveraging Partially Annotated Data via Multi-Modal Goal Conditioned Policy
Peiyan Li
Hongtao Wu
Yan Huang
Chilam Cheang
Liang Wang
Tao Kong
VGen
54
11
0
26 Aug 2024
Re-Mix: Optimizing Data Mixtures for Large Scale Imitation Learning
Joey Hejna
Chethan Bhateja
Yichen Jian
Karl Pertsch
Dorsa Sadigh
25
15
0
26 Aug 2024
Learning Precise Affordances from Egocentric Videos for Robotic Manipulation
Gen Li
Nikolaos Tsagkas
Jifei Song
Ruaridh Mon-Williams
S. Vijayakumar
Kun Shao
Laura Sevilla-Lara
36
8
0
19 Aug 2024
SHARP: Segmentation of Hands and Arms by Range using Pseudo-Depth for Enhanced Egocentric 3D Hand Pose Estimation and Action Recognition
Wiktor Mucha
Michael Wray
M. Kampel
25
0
0
19 Aug 2024
Spherical World-Locking for Audio-Visual Localization in Egocentric Videos
Heeseung Yun
Ruohan Gao
Ishwarya Ananthabhotla
Anurag Kumar
Jacob Donley
Chao Li
Gunhee Kim
V. Ithapu
Calvin Murdock
45
1
0
09 Aug 2024
Weak-Annotation of HAR Datasets using Vision Foundation Models
Marius Bock
Kristof Van Laerhoven
Michael Moeller
30
1
0
09 Aug 2024
VideoQA in the Era of LLMs: An Empirical Study
Junbin Xiao
Nanxin Huang
Hangyu Qin
Dongyang Li
Yicong Li
...
Zhulin Tao
Jianxing Yu
Liang Lin
Tat-Seng Chua
Angela Yao
25
10
0
08 Aug 2024
LLaVA-OneVision: Easy Visual Task Transfer
Bo Li
Yuanhan Zhang
Dong Guo
Renrui Zhang
Feng Li
Hao Zhang
Kaichen Zhang
Yanwei Li
Ziwei Liu
Chunyuan Li
MLLM
SyDa
VLM
58
554
0
06 Aug 2024
Infusing Environmental Captions for Long-Form Video Language Grounding
Hyogun Lee
Soyeon Hong
Mujeen Sung
Jinwoo Choi
40
0
0
05 Aug 2024
User-in-the-loop Evaluation of Multimodal LLMs for Activity Assistance
Mrinal Verghese
Brian Chen
H. Eghbalzadeh
Tushar Nagarajan
Ruta Desai
LRM
45
1
0
04 Aug 2024
SynopGround: A Large-Scale Dataset for Multi-Paragraph Video Grounding from TV Dramas and Synopses
Chaolei Tan
Zihang Lin
Junfu Pu
Zhongang Qi
Wei-Yi Pei
Zhi Qu
Yexin Wang
Ying Shan
Wei-Shi Zheng
Jianfang Hu
AI4TS
43
0
0
03 Aug 2024
PEAR: Phrase-Based Hand-Object Interaction Anticipation
Zichen Zhang
Hongcheng Luo
Wei Zhai
N. A. Ushakov
Yu Kang
42
5
0
31 Jul 2024
Theia: Distilling Diverse Vision Foundation Models for Robot Learning
Jinghuan Shang
Karl Schmeckpeper
Brandon B. May
M. Minniti
Tarik Kelestemur
David Watkins
Laura Herlant
VLM
34
23
0
29 Jul 2024
EPD: Long-term Memory Extraction, Context-awared Planning and Multi-iteration Decision @ EgoPlan Challenge ICML 2024
Letian Shi
Qi Lv
Xiang Deng
Liqiang Nie
40
1
0
28 Jul 2024
Ego-VPA: Egocentric Video Understanding with Parameter-efficient Adaptation
Tz-Ying Wu
Kyle Min
Subarna Tripathi
Nuno Vasconcelos
EgoV
55
0
0
28 Jul 2024
HRP: Human Affordances for Robotic Pre-Training
Mohan Kumar Srirama
Sudeep Dasari
Shikhar Bahl
Abhinav Gupta
33
14
0
26 Jul 2024
PianoMime: Learning a Generalist, Dexterous Piano Player from Internet Demonstrations
Cheng Qian
Julen Urain
Kevin Zakka
Jan Peters
22
4
0
25 Jul 2024
HVM-1: Large-scale video models pretrained with nearly 5000 hours of human-like video data
Emin Orhan
VLM
SyDa
38
1
0
25 Jul 2024
Harnessing Temporal Causality for Advanced Temporal Action Detection
Shuming Liu
Lin Sui
Chen-Da Liu-Zhang
Fangzhou Mu
Chen Zhao
Guohao Li
CML
40
2
0
25 Jul 2024
Babel: A Scalable Pre-trained Model for Multi-Modal Sensing via Expandable Modality Alignment
Shenghong Dai
Shiqi Jiang
Yifan Yang
Ting Cao
Mo Li
Suman Banerjee
Lili Qiu
49
2
0
25 Jul 2024
OVR: A Dataset for Open Vocabulary Temporal Repetition Counting in Videos
Debidatta Dwibedi
Y. Aytar
Jonathan Tompson
Andrew Zisserman
34
2
0
24 Jul 2024
EgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video Retrieval
Thomas Hummel
Shyamgopal Karthik
Mariana-Iuliana Georgescu
Zeynep Akata
EgoV
34
4
0
23 Jul 2024
The CHiME-8 DASR Challenge for Generalizable and Array Agnostic Distant Automatic Speech Recognition and Diarization
Samuele Cornell
Taejin Park
Steve Huang
Christoph Boeddeker
Xuankai Chang
Matthew Maciejewski
Matthew Wiesner
Paola García
Shinji Watanabe
39
9
0
23 Jul 2024
Navigation Instruction Generation with BEV Perception and Large Language Models
Sheng Fan
Rui Liu
Wenguan Wang
Yi Yang
42
5
0
21 Jul 2024
Simultaneous Localization and Affordance Prediction for Tasks in Egocentric Video
Zachary Chavis
Hyun Soo Park
Stephen J. Guy
EgoV
44
0
0
18 Jul 2024
Gated Temporal Diffusion for Stochastic Long-Term Dense Anticipation
Olga Zatsarynna
Emad Bahrami
Yazan Abu Farha
Gianpiero Francesca
Juergen Gall
43
1
0
16 Jul 2024
Hamba: Single-view 3D Hand Reconstruction with Graph-guided Bi-Scanning Mamba
Haoye Dong
Aviral Chharia
Wenbo Gou
Francisco Vicente Carrasco
Fernando de la Torre
Mamba
51
2
0
12 Jul 2024
Hybrid Structure-from-Motion and Camera Relocalization for Enhanced Egocentric Localization
Jinjie Mai
Abdullah Hamdi
Silvio Giancola
Chen Zhao
Guohao Li
EgoV
39
0
0
10 Jul 2024
ActionVOS: Actions as Prompts for Video Object Segmentation
Liangyang Ouyang
Ruicong Liu
Yifei Huang
Ryosuke Furuta
Yoichi Sato
VOS
42
2
0
10 Jul 2024
Video In-context Learning: Autoregressive Transformers are Zero-Shot Video Imitators
Wentao Zhang
Junliang Guo
Tianyu He
Li Zhao
Linli Xu
Jiang Bian
47
3
0
10 Jul 2024
Masked Video and Body-worn IMU Autoencoder for Egocentric Action Recognition
Mingfang Zhang
Yifei Huang
Ruicong Liu
Yoichi Sato
51
4
0
09 Jul 2024
Short-term Object Interaction Anticipation with Disentangled Object Detection @ Ego4D Short Term Object Interaction Anticipation Challenge
Hyunjin Cho
Dong un Kang
Se Young Chun
19
0
0
08 Jul 2024
CaRe-Ego: Contact-aware Relationship Modeling for Egocentric Interactive Hand-object Segmentation
Yuejiao Su
Yi Wang
Lap-Pui Chau
65
1
0
08 Jul 2024
Previous
1
2
3
4
5
6
...
14
15
16
Next