ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2110.07058
  4. Cited By
Ego4D: Around the World in 3,000 Hours of Egocentric Video

Ego4D: Around the World in 3,000 Hours of Egocentric Video

13 October 2021
Kristen Grauman
Andrew Westbury
Eugene Byrne
Zachary Chavis
Antonino Furnari
Rohit Girdhar
Jackson Hamburger
Hao Jiang
Miao Liu
Xingyu Liu
Miguel Martin
Tushar Nagarajan
Ilija Radosavovic
Santhosh Kumar Ramakrishnan
Fiona Ryan
J. Sharma
Michael Wray
Mengmeng Xu
Eric Z. Xu
Chen Zhao
Siddhant Bansal
Dhruv Batra
Vincent Cartillier
Sean Crane
Tien Do
Morrie Doulaty
Akshay Erapalli
Christoph Feichtenhofer
A. Fragomeni
Qichen Fu
A. Gebreselasie
Cristina González
James M. Hillis
Xuhua Huang
Yifei Huang
Wenqi Jia
Weslie Khoo
J. Kolár
Satwik Kottur
Anurag Kumar
F. Landini
Chao Li
Yanghao Li
Zhenqiang Li
K. Mangalam
Raghava Modhugu
Jonathan Munro
Tullie Murrell
Takumi Nishiyasu
Will Price
Paola Ruiz Puentes
Merey Ramazanova
Leda Sari
Kiran Somasundaram
Audrey Southerland
Yusuke Sugano
Ruijie Tao
Minh Vo
Yuchen Wang
Xindi Wu
Takuma Yagi
Ziwei Zhao
Yunyi Zhu
Pablo Arbelaez
David J. Crandall
Dima Damen
G. Farinella
Christian Fuegen
Guohao Li
V. Ithapu
C. V. Jawahar
Hanbyul Joo
Kris M. Kitani
Haizhou Li
Richard Newcombe
A. Oliva
H. Park
James M. Rehg
Yoichi Sato
Jianbo Shi
Mike Zheng Shou
Antonio Torralba
Lorenzo Torresani
Mingfei Yan
Jitendra Malik
    EgoV
ArXivPDFHTML

Papers citing "Ego4D: Around the World in 3,000 Hours of Egocentric Video"

50 / 786 papers shown
Title
EgoSim: An Egocentric Multi-view Simulator and Real Dataset for Body-worn Cameras during Motion and Activity
EgoSim: An Egocentric Multi-view Simulator and Real Dataset for Body-worn Cameras during Motion and Activity
Dominik Hollidt
Paul Streli
Jiaxi Jiang
Yasaman Haghighi
Changlin Qian
Xintong Liu
Christian Holz
EgoV
82
2
0
25 Feb 2025
Memory Helps, but Confabulation Misleads: Understanding Streaming Events in Videos with MLLMs
Memory Helps, but Confabulation Misleads: Understanding Streaming Events in Videos with MLLMs
Gengyuan Zhang
Mingcong Ding
Tong Liu
Yao Zhang
Volker Tresp
82
1
0
24 Feb 2025
Brain-Model Evaluations Need the NeuroAI Turing Test
Jenelle Feather
Meenakshi Khosla
N. Apurva Ratan Murty
Aran Nayebi
90
3
0
22 Feb 2025
RoboMIND: Benchmark on Multi-embodiment Intelligence Normative Data for Robot Manipulation
RoboMIND: Benchmark on Multi-embodiment Intelligence Normative Data for Robot Manipulation
Kun Wu
Chengkai Hou
Jiaming Liu
Zhengping Che
Xiaozhu Ju
...
Zhenyu Wang
Pengju An
Siyuan Qian
S. Zhang
Jian Tang
LM&Ro
113
15
0
17 Feb 2025
EgoSpeak: Learning When to Speak for Egocentric Conversational Agents in the Wild
EgoSpeak: Learning When to Speak for Egocentric Conversational Agents in the Wild
Junhyeok Kim
Min Soo Kim
Jiwan Chung
Jungbin Cho
Jisoo Kim
Sungwoong Kim
Gyeongbo Sim
Youngjae Yu
EgoV
55
0
0
17 Feb 2025
SVBench: A Benchmark with Temporal Multi-Turn Dialogues for Streaming Video Understanding
SVBench: A Benchmark with Temporal Multi-Turn Dialogues for Streaming Video Understanding
Zhenyu Yang
Yihan Hu
Zemin Du
Dizhan Xue
Shengsheng Qian
Jiahong Wu
Fan Yang
W. Dong
Changsheng Xu
47
4
0
15 Feb 2025
Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented Generation
Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented Generation
Mohammad Mahdi Abootorabi
Amirhosein Zobeiri
Mahdi Dehghani
Mohammadali Mohammadkhani
Bardia Mohammadi
Omid Ghahroodi
M. Baghshah
Ehsaneddin Asgari
RALM
105
4
0
12 Feb 2025
DexVLA: Vision-Language Model with Plug-In Diffusion Expert for General Robot Control
DexVLA: Vision-Language Model with Plug-In Diffusion Expert for General Robot Control
Junjie Wen
Bo Li
Jinming Li
Zhibin Tang
Chaomin Shen
Feifei Feng
VLM
61
12
0
09 Feb 2025
InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling
InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling
Yi Wang
Xinhao Li
Ziang Yan
Yinan He
Jiashuo Yu
...
Kai Chen
Wenhai Wang
Yu Qiao
Yali Wang
Limin Wang
89
19
0
21 Jan 2025
Collaborative Learning for 3D Hand-Object Reconstruction and Compositional Action Recognition from Egocentric RGB Videos Using Superquadrics
Collaborative Learning for 3D Hand-Object Reconstruction and Compositional Action Recognition from Egocentric RGB Videos Using Superquadrics
Tze Ho Elden Tse
Runyang Feng
Linfang Zheng
Jiho Park
Yixing Gao
Jihie Kim
A. Leonardis
H. Chang
49
0
0
13 Jan 2025
Motion Tracks: A Unified Representation for Human-Robot Transfer in Few-Shot Imitation Learning
Motion Tracks: A Unified Representation for Human-Robot Transfer in Few-Shot Imitation Learning
Juntao Ren
Priya Sundaresan
Dorsa Sadigh
Sanjiban Choudhury
Jeannette Bohg
37
14
0
13 Jan 2025
Differentiable Task Graph Learning: Procedural Activity Representation and Online Mistake Detection from Egocentric Videos
Differentiable Task Graph Learning: Procedural Activity Representation and Online Mistake Detection from Egocentric Videos
Luigi Seminara
G. Farinella
Antonino Furnari
64
7
0
10 Jan 2025
Audio-Language Datasets of Scenes and Events: A Survey
Audio-Language Datasets of Scenes and Events: A Survey
Gijs Wijngaard
Elia Formisano
Michele Esposito
M. Dumontier
81
2
0
10 Jan 2025
OneLLM: One Framework to Align All Modalities with Language
OneLLM: One Framework to Align All Modalities with Language
Jiaming Han
Kaixiong Gong
Yiyuan Zhang
Jiaqi Wang
Kaipeng Zhang
Dahua Lin
Yu Qiao
Peng Gao
Xiangyu Yue
MLLM
104
109
0
10 Jan 2025
Graph-Based Multimodal and Multi-view Alignment for Keystep Recognition
Graph-Based Multimodal and Multi-view Alignment for Keystep Recognition
Julia Lee Romero
Kyle Min
Subarna Tripathi
Morteza Karimzadeh
35
0
0
07 Jan 2025
Human Gaze Boosts Object-Centered Representation Learning
Timothy Schaumlöffel
A. Aubret
Gemma Roig
Jochen Triesch
36
0
0
06 Jan 2025
MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models
Wenyi Hong
Yean Cheng
Z. Yang
Weihan Wang
Lefan Wang
Xiaotao Gu
Shiyu Huang
Yuxiao Dong
J. Tang
CoGe
VLM
71
4
0
06 Jan 2025
MLVU: Benchmarking Multi-task Long Video Understanding
MLVU: Benchmarking Multi-task Long Video Understanding
Junjie Zhou
Yan Shu
Bo Zhao
Boya Wu
Zhengyang Liang
...
Xi Yang
Y. Xiong
Bo Zhang
Tiejun Huang
Zheng Liu
VLM
58
11
0
03 Jan 2025
Vinci: A Real-time Embodied Smart Assistant based on Egocentric Vision-Language Model
Vinci: A Real-time Embodied Smart Assistant based on Egocentric Vision-Language Model
Y. Huang
Jilan Xu
Baoqi Pei
Yuping He
Guo Chen
...
Kunpeng Li
C. Yuan
Yidan Wang
Yu Qiao
L. Wang
78
4
0
31 Dec 2024
VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling
VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling
Xinhao Li
Yi Wang
Jiashuo Yu
Xiangyu Zeng
Yuhan Zhu
...
Yinan He
Chenting Wang
Yu Qiao
Yali Wang
L. Wang
VLM
77
25
0
31 Dec 2024
A Toolkit for Virtual Reality Data Collection
A Toolkit for Virtual Reality Data Collection
Tim Rolff
Niklas Hypki
Markus Lappe
Frank Steinicke
26
0
0
23 Dec 2024
Interact with me: Joint Egocentric Forecasting of Intent to Interact, Attitude and Social Actions
Interact with me: Joint Egocentric Forecasting of Intent to Interact, Attitude and Social Actions
Tongfei Bian
Yiming Ma
Mathieu Chollet
Victor Sanchez
T. Guha
EgoV
97
1
0
21 Dec 2024
Predictive Inverse Dynamics Models are Scalable Learners for Robotic
  Manipulation
Predictive Inverse Dynamics Models are Scalable Learners for Robotic Manipulation
Yang Tian
Sizhe Yang
Jia Zeng
P. Wang
Dahua Lin
Hao Dong
Jiangmiao Pang
84
17
0
19 Dec 2024
Thinking in Space: How Multimodal Large Language Models See, Remember,
  and Recall Spaces
Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces
Jihan Yang
Shusheng Yang
Anjali W. Gupta
Rilyn Han
Li Fei-Fei
Saining Xie
LRM
121
51
0
18 Dec 2024
Do Language Models Understand Time?
Do Language Models Understand Time?
Xi Ding
Lei Wang
181
0
0
18 Dec 2024
HandsOnVLM: Vision-Language Models for Hand-Object Interaction
  Prediction
HandsOnVLM: Vision-Language Models for Hand-Object Interaction Prediction
Chen Bao
Jiarui Xu
Xiaolong Wang
Abhinav Gupta
Homanga Bharadhwaj
76
2
0
17 Dec 2024
GEM: A Generalizable Ego-Vision Multimodal World Model for Fine-Grained
  Ego-Motion, Object Dynamics, and Scene Composition Control
GEM: A Generalizable Ego-Vision Multimodal World Model for Fine-Grained Ego-Motion, Object Dynamics, and Scene Composition Control
Mariam Hassan
Sebastian Stapf
Ahmad Rahimi
Pedro M B Rezende
Yasaman Haghighi
...
Mathieu Salzmann
Davide Scaramuzza
Marc Pollefeys
Paolo Favaro
Alexandre Alahi
VLM
VGen
77
5
0
15 Dec 2024
Detecting Activities of Daily Living in Egocentric Video to
  Contextualize Hand Use at Home in Outpatient Neurorehabilitation Settings
Detecting Activities of Daily Living in Egocentric Video to Contextualize Hand Use at Home in Outpatient Neurorehabilitation Settings
Adesh Kadambi
José Zariffa
EgoV
70
2
0
14 Dec 2024
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for
  Long-term Streaming Video and Audio Interactions
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
Pan Zhang
Xiaoyi Dong
Yuhang Cao
Yuhang Zang
Rui Qian
...
Xiaotian Zhang
K. Chen
Yu Qiao
Dahua Lin
Jiaqi Wang
KELM
84
12
0
12 Dec 2024
GEXIA: Granularity Expansion and Iterative Approximation for Scalable
  Multi-grained Video-language Learning
GEXIA: Granularity Expansion and Iterative Approximation for Scalable Multi-grained Video-language Learning
Yunhong Wang
Zhikang Zhang
Jue Wang
D. Fan
Zhenlin Xu
Linda Liu
Xiang Hao
Vimal Bhat
Xinyu Li
VLM
82
1
0
10 Dec 2024
RoboMM: All-in-One Multimodal Large Model for Robotic Manipulation
RoboMM: All-in-One Multimodal Large Model for Robotic Manipulation
Feng Yan
Fanfan Liu
Liming Zheng
Yufeng Zhong
Yiyang Huang
Zechao Guan
Chengjian Feng
Lin Ma
84
2
0
10 Dec 2024
EgoPoints: Advancing Point Tracking for Egocentric Videos
EgoPoints: Advancing Point Tracking for Egocentric Videos
Ahmad Darkhalil
Rhodri Guerrier
Adam W. Harley
Dima Damen
72
2
0
05 Dec 2024
EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios
EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios
Lu Qiu
Yuying Ge
Yi Chen
Yixiao Ge
Ying Shan
Xihui Liu
LLMAG
LRM
98
5
0
05 Dec 2024
Streaming Detection of Queried Event Start
Streaming Detection of Queried Event Start
Cristobal Eyzaguirre
Eric Tang
S. Buch
Adrien Gaidon
Jiajun Wu
Juan Carlos Niebles
79
0
0
04 Dec 2024
Navigation World Models
Navigation World Models
Amir Bar
G. Zhou
Danny Tran
Trevor Darrell
Yann LeCun
VGen
EgoV
82
14
0
04 Dec 2024
EgoCast: Forecasting Egocentric Human Pose in the Wild
EgoCast: Forecasting Egocentric Human Pose in the Wild
María Escobar
Juanita Puentes
Cristhian Forigua
Jordi Pont-Tuset
Kevis-Kokitsi Maninis
Pablo Arbeláez
EgoV
73
2
0
03 Dec 2024
FoundHand: Large-Scale Domain-Specific Learning for Controllable Hand
  Image Generation
FoundHand: Large-Scale Domain-Specific Learning for Controllable Hand Image Generation
Kefan Chen
Chaerin Min
Linguang Zhang
Shreyas Hampali
Cem Keskin
Srinath Sridhar
77
0
0
03 Dec 2024
RELOCATE: A Simple Training-Free Baseline for Visual Query Localization
  Using Region-Based Representations
RELOCATE: A Simple Training-Free Baseline for Visual Query Localization Using Region-Based Representations
Savya Khosla
S. Vallecorsa
A. Schwing
Derek Hoiem
59
0
0
02 Dec 2024
VideoLights: Feature Refinement and Cross-Task Alignment Transformer for
  Joint Video Highlight Detection and Moment Retrieval
VideoLights: Feature Refinement and Cross-Task Alignment Transformer for Joint Video Highlight Detection and Moment Retrieval
Dhiman Paul
Md Rizwan Parvez
Nabeel Mohammed
Shafin Rahman
VGen
75
0
0
02 Dec 2024
Eyes on the Road: State-of-the-Art Video Question Answering Models
  Assessment for Traffic Monitoring Tasks
Eyes on the Road: State-of-the-Art Video Question Answering Models Assessment for Traffic Monitoring Tasks
Joseph Raj Vishal
Divesh Basina
Aarya Choudhary
Bharatesh Chakravarthi
67
1
0
02 Dec 2024
SEAL: Semantic Attention Learning for Long Video Representation
SEAL: Semantic Attention Learning for Long Video Representation
Lan Wang
Yujia Chen
Wen-Sheng Chu
Vishnu Naresh Boddeti
Du Tran
VLM
75
0
0
02 Dec 2024
SOLAMI: Social Vision-Language-Action Modeling for Immersive Interaction
  with 3D Autonomous Characters
SOLAMI: Social Vision-Language-Action Modeling for Immersive Interaction with 3D Autonomous Characters
Jianping Jiang
Weiye Xiao
Zhengyu Lin
H. Zhang
Tianxiang Ren
Yang Gao
Zhiqian Lin
Zhongang Cai
Lei Yang
Ziwei Liu
86
3
0
29 Nov 2024
Lift3D Foundation Policy: Lifting 2D Large-Scale Pretrained Models for
  Robust 3D Robotic Manipulation
Lift3D Foundation Policy: Lifting 2D Large-Scale Pretrained Models for Robust 3D Robotic Manipulation
Yueru Jia
Jiaming Liu
Sixiang Chen
Chenyang Gu
Z. Wang
...
Lily Lee
Pengwei Wang
Zhongyuan Wang
Renrui Zhang
Shanghang Zhang
89
11
0
27 Nov 2024
PROGRESSOR: A Perceptually Guided Reward Estimator with Self-Supervised
  Online Refinement
PROGRESSOR: A Perceptually Guided Reward Estimator with Self-Supervised Online Refinement
Tewodros Ayalew
Xiao Zhang
Kevin Yuanbo Wu
Tianchong Jiang
Michael Maire
Matthew R. Walter
OffRL
75
1
0
26 Nov 2024
Online Episodic Memory Visual Query Localization with Egocentric
  Streaming Object Memory
Online Episodic Memory Visual Query Localization with Egocentric Streaming Object Memory
Zaira Manigrasso
Matteo Dunnhofer
Antonino Furnari
Moritz Nottebaum
Antonio Finocchiaro
Davide Marana
G. Farinella
C. Micheloni
78
1
0
25 Nov 2024
VideoOrion: Tokenizing Object Dynamics in Videos
VideoOrion: Tokenizing Object Dynamics in Videos
Yicheng Feng
Yijiang Li
Wanpeng Zhang
Sipeng Zheng
Zongqing Lu
Sipeng Zheng
Zongqing Lu
109
1
0
25 Nov 2024
Is 'Right' Right? Enhancing Object Orientation Understanding in Multimodal Large Language Models through Egocentric Instruction Tuning
Is 'Right' Right? Enhancing Object Orientation Understanding in Multimodal Large Language Models through Egocentric Instruction Tuning
Ji Hyeok Jung
Eun Tae Kim
S. Kim
Joo Ho Lee
Bumsoo Kim
Buru Chang
VLM
184
0
0
24 Nov 2024
ACE: Action Concept Enhancement of Video-Language Models in Procedural
  Videos
ACE: Action Concept Enhancement of Video-Language Models in Procedural Videos
Reza Ghoddoosian
Nakul Agarwal
Isht Dwivedi
Behzad Darisuh
68
0
0
23 Nov 2024
Extending Video Masked Autoencoders to 128 frames
Extending Video Masked Autoencoders to 128 frames
N. B. Gundavarapu
Luke Friedman
Raghav Goyal
Chaitra Hegde
Eirikur Agustsson
...
Mikhail Sirotenko
Ming Yang
Tobias Weyand
Boqing Gong
Leonid Sigal
82
1
0
20 Nov 2024
VidMan: Exploiting Implicit Dynamics from Video Diffusion Model for
  Effective Robot Manipulation
VidMan: Exploiting Implicit Dynamics from Video Diffusion Model for Effective Robot Manipulation
Youpeng Wen
Junfan Lin
Bo Li
J. Han
Hang Xu
Shen Zhao
Xiaodan Liang
VGen
DiffM
43
2
0
14 Nov 2024
Previous
123456...141516
Next