ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2110.07058
  4. Cited By
Ego4D: Around the World in 3,000 Hours of Egocentric Video

Ego4D: Around the World in 3,000 Hours of Egocentric Video

13 October 2021
Kristen Grauman
Andrew Westbury
Eugene Byrne
Zachary Chavis
Antonino Furnari
Rohit Girdhar
Jackson Hamburger
Hao Jiang
Miao Liu
Xingyu Liu
Miguel Martin
Tushar Nagarajan
Ilija Radosavovic
Santhosh Kumar Ramakrishnan
Fiona Ryan
J. Sharma
Michael Wray
Mengmeng Xu
Eric Z. Xu
Chen Zhao
Siddhant Bansal
Dhruv Batra
Vincent Cartillier
Sean Crane
Tien Do
Morrie Doulaty
Akshay Erapalli
Christoph Feichtenhofer
A. Fragomeni
Qichen Fu
A. Gebreselasie
Cristina González
James M. Hillis
Xuhua Huang
Yifei Huang
Wenqi Jia
Weslie Khoo
J. Kolár
Satwik Kottur
Anurag Kumar
F. Landini
Chao Li
Yanghao Li
Zhenqiang Li
K. Mangalam
Raghava Modhugu
Jonathan Munro
Tullie Murrell
Takumi Nishiyasu
Will Price
Paola Ruiz Puentes
Merey Ramazanova
Leda Sari
Kiran Somasundaram
Audrey Southerland
Yusuke Sugano
Ruijie Tao
Minh Vo
Yuchen Wang
Xindi Wu
Takuma Yagi
Ziwei Zhao
Yunyi Zhu
Pablo Arbelaez
David J. Crandall
Dima Damen
G. Farinella
Christian Fuegen
Guohao Li
V. Ithapu
C. V. Jawahar
Hanbyul Joo
Kris M. Kitani
Haizhou Li
Richard Newcombe
A. Oliva
H. Park
James M. Rehg
Yoichi Sato
Jianbo Shi
Mike Zheng Shou
Antonio Torralba
Lorenzo Torresani
Mingfei Yan
Jitendra Malik
    EgoV
ArXivPDFHTML

Papers citing "Ego4D: Around the World in 3,000 Hours of Egocentric Video"

50 / 791 papers shown
Title
AFF-ttention! Affordances and Attention models for Short-Term Object
  Interaction Anticipation
AFF-ttention! Affordances and Attention models for Short-Term Object Interaction Anticipation
Lorenzo Mur-Labadia
Ruben Martinez-Cantin
Jose J. Guerrero
G. Farinella
Antonino Furnari
37
4
0
03 Jun 2024
Object Aware Egocentric Online Action Detection
Object Aware Egocentric Online Action Detection
Joungbin An
Yunsu Park
Hyolim Kang
Seon Joo Kim
EgoV
31
0
0
03 Jun 2024
Learning Manipulation by Predicting Interaction
Learning Manipulation by Predicting Interaction
Jia Zeng
Qingwen Bu
Bangjun Wang
Wenke Xia
Li Chen
...
Heming Cui
Bin Zhao
Xuelong Li
Yu Qiao
Hongyang Li
58
21
0
01 Jun 2024
HENASY: Learning to Assemble Scene-Entities for Egocentric
  Video-Language Model
HENASY: Learning to Assemble Scene-Entities for Egocentric Video-Language Model
Khoa T. Vo
Thinh Phan
Kashu Yamazaki
Minh-Triet Tran
Ngan Le
45
1
0
01 Jun 2024
Can't make an Omelette without Breaking some Eggs: Plausible Action
  Anticipation using Large Video-Language Models
Can't make an Omelette without Breaking some Eggs: Plausible Action Anticipation using Large Video-Language Models
Himangi Mittal
Nakul Agarwal
Shao-Yuan Lo
Kwonjoon Lee
44
14
0
30 May 2024
EMAG: Ego-motion Aware and Generalizable 2D Hand Forecasting from
  Egocentric Videos
EMAG: Ego-motion Aware and Generalizable 2D Hand Forecasting from Egocentric Videos
Masashi Hatano
Ryo Hachiuma
Hideo Saito
EgoV
37
3
0
30 May 2024
Video-Language Critic: Transferable Reward Functions for
  Language-Conditioned Robotics
Video-Language Critic: Transferable Reward Functions for Language-Conditioned Robotics
Minttu Alakuijala
Reginald McLean
Isaac Woungang
Nariman Farsad
Samuel Kaski
Pekka Marttinen
Kai Yuan
LM&Ro
42
1
0
30 May 2024
Encoding and Controlling Global Semantics for Long-form Video Question
  Answering
Encoding and Controlling Global Semantics for Long-form Video Question Answering
Thong Nguyen
Zhiyuan Hu
Xiaobao Wu
Cong-Duy Nguyen
See-Kiong Ng
A. Luu
43
3
0
30 May 2024
Streaming Long Video Understanding with Large Language Models
Streaming Long Video Understanding with Large Language Models
Rui Qian
Xiao-wen Dong
Pan Zhang
Yuhang Zang
Shuangrui Ding
Dahua Lin
Jiaqi Wang
VLM
39
41
0
25 May 2024
Recasting Generic Pretrained Vision Transformers As Object-Centric Scene
  Encoders For Manipulation Policies
Recasting Generic Pretrained Vision Transformers As Object-Centric Scene Encoders For Manipulation Policies
Jianing Qian
Anastasios Panagopoulos
Dinesh Jayaraman
LM&Ro
ViT
38
5
0
24 May 2024
iVideoGPT: Interactive VideoGPTs are Scalable World Models
iVideoGPT: Interactive VideoGPTs are Scalable World Models
Jialong Wu
Shaofeng Yin
Ningya Feng
Xu He
Dong Li
Haifeng Zhang
Mingsheng Long
VGen
49
22
0
24 May 2024
Generative Camera Dolly: Extreme Monocular Dynamic Novel View Synthesis
Generative Camera Dolly: Extreme Monocular Dynamic Novel View Synthesis
Basile Van Hoorick
Rundi Wu
Ege Ozguroglu
Kyle Sargent
Ruoshi Liu
P. Tokmakov
Achal Dave
Changxi Zheng
Carl Vondrick
DiffM
VGen
58
29
0
23 May 2024
A Survey on Vision-Language-Action Models for Embodied AI
A Survey on Vision-Language-Action Models for Embodied AI
Yueen Ma
Zixing Song
Yuzheng Zhuang
Jianye Hao
Irwin King
LM&Ro
82
43
0
23 May 2024
No Filter: Cultural and Socioeconomic Diversity in Contrastive
  Vision-Language Models
No Filter: Cultural and Socioeconomic Diversity in Contrastive Vision-Language Models
Angeline Pouget
Lucas Beyer
Emanuele Bugliarello
Xiao Wang
Andreas Steiner
Xiao-Qi Zhai
Ibrahim M. Alabdulmohsin
VLM
33
7
0
22 May 2024
EgoChoir: Capturing 3D Human-Object Interaction Regions from Egocentric
  Views
EgoChoir: Capturing 3D Human-Object Interaction Regions from Egocentric Views
Yuhang Yang
Wei Zhai
Chengfeng Wang
Chengjun Yu
Yang Cao
Zheng-jun Zha
44
5
0
22 May 2024
Active Object Detection with Knowledge Aggregation and Distillation from
  Large Models
Active Object Detection with Knowledge Aggregation and Distillation from Large Models
Dejie Yang
Yang Liu
39
3
0
21 May 2024
WorldAfford: Affordance Grounding based on Natural Language Instructions
WorldAfford: Affordance Grounding based on Natural Language Instructions
Changmao Chen
Yuren Cong
Zhen Kan
24
4
0
21 May 2024
Natural Language Can Help Bridge the Sim2Real Gap
Natural Language Can Help Bridge the Sim2Real Gap
Albert Yu
Adeline Foote
Raymond J. Mooney
Roberto Martín-Martín
LM&Ro
51
11
0
16 May 2024
BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation
BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation
Yunhao Ge
Yihe Tang
Lyne Tchapmi
Cem Gokmen
Chengshu Li
...
Miao Liu
Pengchuan Zhang
Ruohan Zhang
Fei-Fei Li
Jiajun Wu
VGen
50
6
0
15 May 2024
Hearing Touch: Audio-Visual Pretraining for Contact-Rich Manipulation
Hearing Touch: Audio-Visual Pretraining for Contact-Rich Manipulation
Jared Mejia
Victoria Dean
Tess Hellebrekers
Abhinav Gupta
45
12
0
14 May 2024
Coarse or Fine? Recognising Action End States without Labels
Coarse or Fine? Recognising Action End States without Labels
Davide Moltisanti
Hakan Bilen
Laura Sevilla-Lara
Frank Keller
43
0
0
13 May 2024
Bidirectional Progressive Transformer for Interaction Intention
  Anticipation
Bidirectional Progressive Transformer for Interaction Intention Anticipation
Zichen Zhang
Hongcheng Luo
Wei Zhai
Yang Cao
Yu Kang
27
5
0
09 May 2024
Diff-IP2D: Diffusion-Based Hand-Object Interaction Prediction on
  Egocentric Videos
Diff-IP2D: Diffusion-Based Hand-Object Interaction Prediction on Egocentric Videos
Junyi Ma
Jingyi Xu
Xieyuanli Chen
Hesheng Wang
VGen
38
7
0
07 May 2024
OmniActions: Predicting Digital Actions in Response to Real-World
  Multimodal Sensory Inputs with LLMs
OmniActions: Predicting Digital Actions in Response to Real-World Multimodal Sensory Inputs with LLMs
Jiahao Nick Li
Yan Xu
Tovi Grossman
Stephanie Santosa
Michelle Li
41
13
0
06 May 2024
ScrewMimic: Bimanual Imitation from Human Videos with Screw Space
  Projection
ScrewMimic: Bimanual Imitation from Human Videos with Screw Space Projection
Arpit Bahety
Priyanka Mandikal
Ben Abbatematteo
Roberto Martín-Martín
42
14
0
06 May 2024
Meta-Evolve: Continuous Robot Evolution for One-to-many Policy Transfer
Meta-Evolve: Continuous Robot Evolution for One-to-many Policy Transfer
Xingyu Liu
Deepak Pathak
Ding Zhao
36
6
0
06 May 2024
WorldQA: Multimodal World Knowledge in Videos through Long-Chain
  Reasoning
WorldQA: Multimodal World Knowledge in Videos through Long-Chain Reasoning
Yuanhan Zhang
Kaichen Zhang
Bo-wen Li
Fanyi Pu
Christopher Arif Setiadharma
Jingkang Yang
Ziwei Liu
VGen
52
7
0
06 May 2024
Track2Act: Predicting Point Tracks from Internet Videos enables Diverse
  Zero-shot Robot Manipulation
Track2Act: Predicting Point Tracks from Internet Videos enables Diverse Zero-shot Robot Manipulation
Homanga Bharadhwaj
Roozbeh Mottaghi
Abhinav Gupta
Shubham Tulsiani
3DPC
54
16
0
02 May 2024
LEGENT: Open Platform for Embodied Agents
LEGENT: Open Platform for Embodied Agents
Zhili Cheng
Zhitong Wang
Jinyi Hu
Shengding Hu
An Liu
Yuge Tu
Pengkai Li
Lei Shi
Zhiyuan Liu
Maosong Sun
VLM
33
6
0
28 Apr 2024
What Foundation Models can Bring for Robot Learning in Manipulation : A
  Survey
What Foundation Models can Bring for Robot Learning in Manipulation : A Survey
Dingzhe Li
Yixiang Jin
A. Yong
Hongze Yu
Jun Shi
Xiaoshuai Hao
Peng Hao
Huaping Liu
Gang Hua
Bin Fang
AI4CE
LM&Ro
72
13
0
28 Apr 2024
Ag2Manip: Learning Novel Manipulation Skills with Agent-Agnostic Visual
  and Action Representations
Ag2Manip: Learning Novel Manipulation Skills with Agent-Agnostic Visual and Action Representations
Puhao Li
Tengyu Liu
Yuyang Li
Muzhi Han
Haoran Geng
Shu Wang
Yixin Zhu
Song-Chun Zhu
Siyuan Huang
39
16
0
26 Apr 2024
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video
  Dense Captioning
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning
Lin Xu
Yilin Zhao
Daquan Zhou
Zhijie Lin
See Kiong Ng
Jiashi Feng
MLLM
VLM
38
159
0
25 Apr 2024
ViViDex: Learning Vision-based Dexterous Manipulation from Human Videos
ViViDex: Learning Vision-based Dexterous Manipulation from Human Videos
Zerui Chen
Shizhe Chen
Cordelia Schmid
Ivan Laptev
Cordelia Schmid
35
13
0
24 Apr 2024
Rank2Reward: Learning Shaped Reward Functions from Passive Video
Rank2Reward: Learning Shaped Reward Functions from Passive Video
Daniel Yang
Davin Tjia
Jacob Berg
Dima Damen
Pulkit Agrawal
Abhishek Gupta
OffRL
40
5
0
23 Apr 2024
CrossScore: Towards Multi-View Image Evaluation and Scoring
CrossScore: Towards Multi-View Image Evaluation and Scoring
Zirui Wang
Wenjing Bian
Omkar M. Parkhi
Yuheng Ren
V. Prisacariu
51
1
0
22 Apr 2024
Composing Pre-Trained Object-Centric Representations for Robotics From
  "What" and "Where" Foundation Models
Composing Pre-Trained Object-Centric Representations for Robotics From "What" and "Where" Foundation Models
Junyao Shi
Jianing Qian
Yecheng Jason Ma
Dinesh Jayaraman
OCL
38
4
0
20 Apr 2024
Resilience through Scene Context in Visual Referring Expression
  Generation
Resilience through Scene Context in Visual Referring Expression Generation
Simeon Junker
Sina Zarrieß
36
0
0
18 Apr 2024
Sequential Compositional Generalization in Multimodal Models
Sequential Compositional Generalization in Multimodal Models
Semih Yagcioglu
Osman Batur .Ince
Aykut Erdem
Erkut Erdem
Desmond Elliott
Deniz Yuret
41
1
0
18 Apr 2024
EgoPet: Egomotion and Interaction Data from an Animal's Perspective
EgoPet: Egomotion and Interaction Data from an Animal's Perspective
Amir Bar
Arya Bakhtiar
Danny Tran
Antonio Loquercio
Jathushan Rajasegaran
Yann LeCun
Amir Globerson
Trevor Darrell
EgoV
41
4
0
15 Apr 2024
HOI-Ref: Hand-Object Interaction Referral in Egocentric Vision
HOI-Ref: Hand-Object Interaction Referral in Egocentric Vision
Siddhant Bansal
Michael Wray
Dima Damen
41
3
0
15 Apr 2024
A Survey on Multimodal Wearable Sensor-based Human Action Recognition
A Survey on Multimodal Wearable Sensor-based Human Action Recognition
Jianyuan Ni
Hao Tang
Syed Tousiful Haque
Yan Yan
A. Ngu
77
6
0
14 Apr 2024
In My Perspective, In My Hands: Accurate Egocentric 2D Hand Pose and
  Action Recognition
In My Perspective, In My Hands: Accurate Egocentric 2D Hand Pose and Action Recognition
Wiktor Mucha
Martin Kampel
EgoV
27
6
0
14 Apr 2024
MoReVQA: Exploring Modular Reasoning Models for Video Question Answering
MoReVQA: Exploring Modular Reasoning Models for Video Question Answering
Juhong Min
Shyamal Buch
Arsha Nagrani
Minsu Cho
Cordelia Schmid
LRM
44
20
0
09 Apr 2024
SoundingActions: Learning How Actions Sound from Narrated Egocentric
  Videos
SoundingActions: Learning How Actions Sound from Narrated Egocentric Videos
Changan Chen
Kumar Ashutosh
Rohit Girdhar
David Harwath
Kristen Grauman
EgoV
SSL
28
6
0
08 Apr 2024
Spatial Cognition from Egocentric Video: Out of Sight, Not Out of Mind
Spatial Cognition from Egocentric Video: Out of Sight, Not Out of Mind
Chiara Plizzari
Shubham Goel
Toby Perrett
Jacob Chalk
Angjoo Kanazawa
Dima Damen
41
10
0
07 Apr 2024
UniMD: Towards Unifying Moment Retrieval and Temporal Action Detection
UniMD: Towards Unifying Moment Retrieval and Temporal Action Detection
Yingsen Zeng
Yujie Zhong
Chengjian Feng
Lin Ma
63
7
0
07 Apr 2024
JUICER: Data-Efficient Imitation Learning for Robotic Assembly
JUICER: Data-Efficient Imitation Learning for Robotic Assembly
Lars Ankile
Anthony Simeonov
Idan Shenfeld
Pulkit Agrawal
LM&Ro
42
15
0
04 Apr 2024
Is CLIP the main roadblock for fine-grained open-world perception?
Is CLIP the main roadblock for fine-grained open-world perception?
Lorenzo Bianchi
F. Carrara
Nicola Messina
Fabrizio Falchi
VLM
43
4
0
04 Apr 2024
LongVLM: Efficient Long Video Understanding via Large Language Models
LongVLM: Efficient Long Video Understanding via Large Language Models
Yuetian Weng
Mingfei Han
Haoyu He
Xiaojun Chang
Bohan Zhuang
VLM
68
57
0
04 Apr 2024
UniAV: Unified Audio-Visual Perception for Multi-Task Video Localization
UniAV: Unified Audio-Visual Perception for Multi-Task Video Localization
Tiantian Geng
Teng Wang
Yanfu Zhang
Jinming Duan
Weili Guan
Feng Zheng
29
2
0
04 Apr 2024
Previous
123...678...141516
Next