ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2110.07058
  4. Cited By
Ego4D: Around the World in 3,000 Hours of Egocentric Video

Ego4D: Around the World in 3,000 Hours of Egocentric Video

13 October 2021
Kristen Grauman
Andrew Westbury
Eugene Byrne
Zachary Chavis
Antonino Furnari
Rohit Girdhar
Jackson Hamburger
Hao Jiang
Miao Liu
Xingyu Liu
Miguel Martin
Tushar Nagarajan
Ilija Radosavovic
Santhosh Kumar Ramakrishnan
Fiona Ryan
J. Sharma
Michael Wray
Mengmeng Xu
Eric Z. Xu
Chen Zhao
Siddhant Bansal
Dhruv Batra
Vincent Cartillier
Sean Crane
Tien Do
Morrie Doulaty
Akshay Erapalli
Christoph Feichtenhofer
A. Fragomeni
Qichen Fu
A. Gebreselasie
Cristina González
James M. Hillis
Xuhua Huang
Yifei Huang
Wenqi Jia
Weslie Khoo
J. Kolár
Satwik Kottur
Anurag Kumar
F. Landini
Chao Li
Yanghao Li
Zhenqiang Li
K. Mangalam
Raghava Modhugu
Jonathan Munro
Tullie Murrell
Takumi Nishiyasu
Will Price
Paola Ruiz Puentes
Merey Ramazanova
Leda Sari
Kiran Somasundaram
Audrey Southerland
Yusuke Sugano
Ruijie Tao
Minh Vo
Yuchen Wang
Xindi Wu
Takuma Yagi
Ziwei Zhao
Yunyi Zhu
Pablo Arbelaez
David J. Crandall
Dima Damen
G. Farinella
Christian Fuegen
Guohao Li
V. Ithapu
C. V. Jawahar
Hanbyul Joo
Kris Kitani
Haizhou Li
Richard Newcombe
A. Oliva
H. Park
James M. Rehg
Yoichi Sato
Jianbo Shi
Mike Zheng Shou
Antonio Torralba
Lorenzo Torresani
Mingfei Yan
Jitendra Malik
    EgoV
ArXivPDFHTML

Papers citing "Ego4D: Around the World in 3,000 Hours of Egocentric Video"

50 / 791 papers shown
Title
Text-driven Affordance Learning from Egocentric Vision
Text-driven Affordance Learning from Egocentric Vision
Tomoya Yoshida
Shuhei Kurita
Taichi Nishimura
Shinsuke Mori
46
5
0
03 Apr 2024
SnAG: Scalable and Accurate Video Grounding
SnAG: Scalable and Accurate Video Grounding
Fangzhou Mu
Sicheng Mo
Yin Li
44
8
0
02 Apr 2024
SUGAR: Pre-training 3D Visual Representations for Robotics
SUGAR: Pre-training 3D Visual Representations for Robotics
Shizhe Chen
Ricardo Garcia Pinel
Ivan Laptev
Cordelia Schmid
56
14
0
01 Apr 2024
360+x: A Panoptic Multi-modal Scene Understanding Dataset
360+x: A Panoptic Multi-modal Scene Understanding Dataset
Hao Chen
Yuqi Hou
Chenyuan Qu
Irene Testini
Xiaohan Hong
Jianbo Jiao
34
7
0
01 Apr 2024
$R^2$-Tuning: Efficient Image-to-Video Transfer Learning for Video
  Temporal Grounding
R2R^2R2-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding
Ye Liu
Jixuan He
Wanhua Li
Junsik Kim
D. Wei
Hanspeter Pfister
Chang Wen Chen
51
13
0
31 Mar 2024
Survey on Large Language Model-Enhanced Reinforcement Learning: Concept,
  Taxonomy, and Methods
Survey on Large Language Model-Enhanced Reinforcement Learning: Concept, Taxonomy, and Methods
Yuji Cao
Huan Zhao
Yuheng Cheng
Ting Shu
Guolong Liu
Gaoqi Liang
Junhua Zhao
Yun Li
LLMAG
KELM
OffRL
LM&Ro
48
51
0
30 Mar 2024
X-MIC: Cross-Modal Instance Conditioning for Egocentric Action
  Generalization
X-MIC: Cross-Modal Instance Conditioning for Egocentric Action Generalization
Anna Kukleva
Fadime Sener
Edoardo Remelli
Bugra Tekin
Eric Sauser
Bernt Schiele
Shugao Ma
VLM
EgoV
45
1
0
28 Mar 2024
Keypoint Action Tokens Enable In-Context Imitation Learning in Robotics
Keypoint Action Tokens Enable In-Context Imitation Learning in Robotics
Norman Di Palo
Edward Johns
49
33
0
28 Mar 2024
OAKINK2: A Dataset of Bimanual Hands-Object Manipulation in Complex Task
  Completion
OAKINK2: A Dataset of Bimanual Hands-Object Manipulation in Complex Task Completion
Xinyu Zhan
Lixin Yang
Yifei Zhao
Kangrui Mao
Hanlin Xu
Zenan Lin
Kailin Li
Cewu Lu
38
20
0
28 Mar 2024
An Image Grid Can Be Worth a Video: Zero-shot Video Question Answering
  Using a VLM
An Image Grid Can Be Worth a Video: Zero-shot Video Question Answering Using a VLM
Wonkyun Kim
Changin Choi
Wonseok Lee
Wonjong Rhee
VLM
47
51
0
27 Mar 2024
EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World
EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World
Yifei Huang
Guo Chen
Jilan Xu
Mingfang Zhang
Lijin Yang
...
Hongjie Zhang
Lu Dong
Yali Wang
Limin Wang
Yu Qiao
EgoV
71
38
0
24 Mar 2024
Explore until Confident: Efficient Exploration for Embodied Question
  Answering
Explore until Confident: Efficient Exploration for Embodied Question Answering
Allen Z. Ren
Jaden Clark
Anushri Dixit
Masha Itkina
Anirudha Majumdar
Dorsa Sadigh
47
28
0
23 Mar 2024
DITTO: Demonstration Imitation by Trajectory Transformation
DITTO: Demonstration Imitation by Trajectory Transformation
Nick Heppert
Max Argus
Tim Welschehold
Thomas Brox
Abhinav Valada
69
16
0
22 Mar 2024
Selective, Interpretable, and Motion Consistent Privacy Attribute
  Obfuscation for Action Recognition
Selective, Interpretable, and Motion Consistent Privacy Attribute Obfuscation for Action Recognition
Filip Ilic
Henghui Zhao
Thomas Pock
Richard P. Wildes
PICV
AAML
44
2
0
19 Mar 2024
DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset
DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset
Alexander Khazatsky
Karl Pertsch
Suraj Nair
Ashwin Balakrishna
Sudeep Dasari
...
Thomas Kollar
Sergey Levine
Chelsea Finn
Sergey Levine
Chelsea Finn
61
182
0
19 Mar 2024
HOIDiffusion: Generating Realistic 3D Hand-Object Interaction Data
HOIDiffusion: Generating Realistic 3D Hand-Object Interaction Data
Mengqi Zhang
Yang Fu
Zheng Ding
Sifei Liu
Zhuowen Tu
Xiaolong Wang
44
17
0
18 Mar 2024
VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding
VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding
Yue Fan
Xiaojian Ma
Rujie Wu
Yuntao Du
Jiaqi Li
Zhi Gao
Qing Li
VLM
LLMAG
48
57
0
18 Mar 2024
A Survey of IMU Based Cross-Modal Transfer Learning in Human Activity
  Recognition
A Survey of IMU Based Cross-Modal Transfer Learning in Human Activity Recognition
Abhi Kamboj
Minh Do
29
1
0
17 Mar 2024
On the Utility of 3D Hand Poses for Action Recognition
On the Utility of 3D Hand Poses for Action Recognition
Md Salman Shamil
Dibyadip Chatterjee
Fadime Sener
Shugao Ma
Angela Yao
40
5
0
14 Mar 2024
Generalized Predictive Model for Autonomous Driving
Generalized Predictive Model for Autonomous Driving
Jiazhi Yang
Shenyuan Gao
Yihang Qiu
Li Chen
Tianyu Li
...
Ping Luo
Jun Zhang
Andreas Geiger
Yu Qiao
Hongyang Li
VGen
73
57
0
14 Mar 2024
Video Mamba Suite: State Space Model as a Versatile Alternative for
  Video Understanding
Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding
Guo Chen
Yifei Huang
Jilan Xu
Baoqi Pei
Zhe Chen
Zhiqi Li
Jiahao Wang
Kunchang Li
Tong Lu
Limin Wang
Mamba
64
73
0
14 Mar 2024
VANP: Learning Where to See for Navigation with Self-Supervised
  Vision-Action Pre-Training
VANP: Learning Where to See for Navigation with Self-Supervised Vision-Action Pre-Training
Mohammad Nazeri
Junzhe Wang
Amirreza Payandeh
Xuesu Xiao
SSL
ViT
52
6
0
12 Mar 2024
Fast-Forward Reality: Authoring Error-Free Context-Aware Policies with
  Real-Time Unit Tests in Extended Reality
Fast-Forward Reality: Authoring Error-Free Context-Aware Policies with Real-Time Unit Tests in Extended Reality
Xun Qian
Tianyi Wang
Xuhai Xu
Tanya R Jonker
Kashyap Todi
19
2
0
12 Mar 2024
DexCap: Scalable and Portable Mocap Data Collection System for Dexterous
  Manipulation
DexCap: Scalable and Portable Mocap Data Collection System for Dexterous Manipulation
Chen Wang
Haochen Shi
Weizhuo Wang
Ruohan Zhang
Fei-Fei Li
Karen Liu
58
107
0
12 Mar 2024
POV: Prompt-Oriented View-Agnostic Learning for Egocentric Hand-Object
  Interaction in the Multi-View World
POV: Prompt-Oriented View-Agnostic Learning for Egocentric Hand-Object Interaction in the Multi-View World
Boshen Xu
Sipeng Zheng
Qin Jin
52
7
0
09 Mar 2024
Tell, Don't Show!: Language Guidance Eases Transfer Across Domains in
  Images and Videos
Tell, Don't Show!: Language Guidance Eases Transfer Across Domains in Images and Videos
Tarun Kalluri
Bodhisattwa Prasad Majumder
Manmohan Chandraker
VLM
47
4
0
08 Mar 2024
Spatiotemporal Predictive Pre-training for Robotic Motor Control
Spatiotemporal Predictive Pre-training for Robotic Motor Control
Jiange Yang
Bei Liu
Jianlong Fu
Bocheng Pan
Gangshan Wu
Limin Wang
53
10
0
08 Mar 2024
EgoPAT3Dv2: Predicting 3D Action Target from 2D Egocentric Vision for
  Human-Robot Interaction
EgoPAT3Dv2: Predicting 3D Action Target from 2D Egocentric Vision for Human-Robot Interaction
Irving Fang
Yuzhong Chen
Yifan Wang
Jianghan Zhang
Qiushi Zhang
...
Xibo He
Weibo Gao
Hao Su
Yiming Li
Chen Feng
EgoV
34
2
0
08 Mar 2024
Embodied Understanding of Driving Scenarios
Embodied Understanding of Driving Scenarios
Yunsong Zhou
Linyan Huang
Qingwen Bu
Jia Zeng
Tianyu Li
Hang Qiu
Hongzi Zhu
Minyi Guo
Yu Qiao
Hongyang Li
LM&Ro
62
31
0
07 Mar 2024
Single-to-Dual-View Adaptation for Egocentric 3D Hand Pose Estimation
Single-to-Dual-View Adaptation for Egocentric 3D Hand Pose Estimation
Ruicong Liu
Takehiko Ohkawa
Mingfang Zhang
Yoichi Sato
43
9
0
07 Mar 2024
Human I/O: Towards a Unified Approach to Detecting Situational
  Impairments
Human I/O: Towards a Unified Approach to Detecting Situational Impairments
Xingyu Bruce Liu
Jiahao Nick Li
David Kim
Xiang Ánthony' Chen
Andrea Colaço
42
13
0
06 Mar 2024
A Backpack Full of Skills: Egocentric Video Understanding with Diverse
  Task Perspectives
A Backpack Full of Skills: Egocentric Video Understanding with Diverse Task Perspectives
Simone Alberto Peirone
Francesca Pistilli
A. Alliegro
Giuseppe Averta
EgoV
35
5
0
05 Mar 2024
Modeling Multimodal Social Interactions: New Challenges and Baselines
  with Densely Aligned Representations
Modeling Multimodal Social Interactions: New Challenges and Baselines with Densely Aligned Representations
Sangmin Lee
Bolin Lai
Fiona Ryan
Bikram Boote
James M. Rehg
28
8
0
04 Mar 2024
A SOUND APPROACH: Using Large Language Models to generate audio
  descriptions for egocentric text-audio retrieval
A SOUND APPROACH: Using Large Language Models to generate audio descriptions for egocentric text-audio retrieval
Andreea-Maria Oncescu
João F. Henriques
Andrew Zisserman
Samuel Albanie
A. Sophia Koepke
28
5
0
29 Feb 2024
Trends, Applications, and Challenges in Human Attention Modelling
Trends, Applications, and Challenges in Human Attention Modelling
Giuseppe Cartella
Marcella Cornia
Vittorio Cuculo
Alessandro D’Amelio
Dario Zanca
Giuseppe Boccignone
Rita Cucchiara
40
6
0
28 Feb 2024
DecisionNCE: Embodied Multimodal Representations via Implicit Preference
  Learning
DecisionNCE: Embodied Multimodal Representations via Implicit Preference Learning
Jianxiong Li
Jinliang Zheng
Yinan Zheng
Liyuan Mao
Xiaoming Hu
...
Jihao Liu
Yu Liu
Jingjing Liu
Ya Zhang
Xianyuan Zhan
LM&Ro
OffRL
37
9
0
28 Feb 2024
OSCaR: Object State Captioning and State Change Representation
OSCaR: Object State Captioning and State Change Representation
Nguyen Nguyen
Jing Bi
A. Vosoughi
Yapeng Tian
Pooyan Fazli
Chenliang Xu
48
8
0
27 Feb 2024
RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis
RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis
Yao Mu
Junting Chen
Qinglong Zhang
Shoufa Chen
Qiaojun Yu
...
Wenhai Wang
Jifeng Dai
Yu Qiao
Mingyu Ding
Ping Luo
46
22
0
25 Feb 2024
CyberDemo: Augmenting Simulated Human Demonstration for Real-World
  Dexterous Manipulation
CyberDemo: Augmenting Simulated Human Demonstration for Real-World Dexterous Manipulation
Jun Wang
Yuzhe Qin
Kaiming Kuang
Yigit Korkmaz
Akhilan Gurumoorthy
Hao Su
Xiaolong Wang
45
20
0
22 Feb 2024
Towards Diverse Behaviors: A Benchmark for Imitation Learning with Human
  Demonstrations
Towards Diverse Behaviors: A Benchmark for Imitation Learning with Human Demonstrations
Xiaogang Jia
Denis Blessing
Xinkai Jiang
Moritz Reuss
Atalay Donat
Rudolf Lioutikov
Gerhard Neumann
47
20
0
22 Feb 2024
Video ReCap: Recursive Captioning of Hour-Long Videos
Video ReCap: Recursive Captioning of Hour-Long Videos
Md. Mohaiminul Islam
Ngan Ho
Xitong Yang
Tushar Nagarajan
Lorenzo Torresani
Gedas Bertasius
VGen
VLM
35
47
0
20 Feb 2024
VideoPrism: A Foundational Visual Encoder for Video Understanding
VideoPrism: A Foundational Visual Encoder for Video Understanding
Long Zhao
N. B. Gundavarapu
Liangzhe Yuan
Hao Zhou
Shen Yan
...
Huisheng Wang
Hartwig Adam
Mikhail Sirotenko
Ting Liu
Boqing Gong
VGen
50
29
0
20 Feb 2024
The Visual Experience Dataset: Over 200 Recorded Hours of Integrated Eye
  Movement, Odometry, and Egocentric Video
The Visual Experience Dataset: Over 200 Recorded Hours of Integrated Eye Movement, Odometry, and Egocentric Video
Michelle R. Greene
Benjamin Balas
M. Lescroart
Paul MacNeilage
Jennifer A. Hart
...
Matthew W. Shinkle
Wentao Si
Brian Szekely
Joaquin M. Torres
Eliana Weissmann
MDE
24
2
0
15 Feb 2024
Advancing Human Action Recognition with Foundation Models trained on
  Unlabeled Public Videos
Advancing Human Action Recognition with Foundation Models trained on Unlabeled Public Videos
Yang Qian
Yinan Sun
A. Kargarandehkordi
Parnian Azizian
O. Mutlu
Saimourya Surabhi
Pingyi Chen
Zain Jabbar
Dennis Paul Wall
Peter Washington
OffRL
29
1
0
14 Feb 2024
THE COLOSSEUM: A Benchmark for Evaluating Generalization for Robotic
  Manipulation
THE COLOSSEUM: A Benchmark for Evaluating Generalization for Robotic Manipulation
Wilbert Pumacay
Ishika Singh
Jiafei Duan
Ranjay Krishna
Jesse Thomason
Dieter Fox
29
40
0
13 Feb 2024
Learning by Watching: A Review of Video-based Learning Approaches for
  Robot Manipulation
Learning by Watching: A Review of Video-based Learning Approaches for Robot Manipulation
Chrisantus Eze
Christopher Crick
SSL
82
12
0
11 Feb 2024
Memory Consolidation Enables Long-Context Video Understanding
Memory Consolidation Enables Long-Context Video Understanding
Ivana Balavzević
Yuge Shi
Pinelopi Papalampidi
Rahma Chaabouni
Skanda Koppula
Olivier J. Hénaff
108
24
0
08 Feb 2024
Real-World Robot Applications of Foundation Models: A Review
Real-World Robot Applications of Foundation Models: A Review
Kento Kawaharazuka
T. Matsushima
Andrew Gambardella
Jiaxian Guo
Chris Paxton
Andy Zeng
OffRL
VLM
LM&Ro
51
47
0
08 Feb 2024
PoCo: Policy Composition from and for Heterogeneous Robot Learning
PoCo: Policy Composition from and for Heterogeneous Robot Learning
Lirui Wang
Jialiang Zhao
Yilun Du
Edward H. Adelson
Russ Tedrake
79
28
0
04 Feb 2024
Point Cloud Matters: Rethinking the Impact of Different Observation
  Spaces on Robot Learning
Point Cloud Matters: Rethinking the Impact of Different Observation Spaces on Robot Learning
Haoyi Zhu
Yating Wang
Di Huang
Weicai Ye
Wanli Ouyang
Tong He
SSL
3DPC
59
20
0
04 Feb 2024
Previous
123...789...141516
Next