ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2110.07058
  4. Cited By
Ego4D: Around the World in 3,000 Hours of Egocentric Video

Ego4D: Around the World in 3,000 Hours of Egocentric Video

13 October 2021
Kristen Grauman
Andrew Westbury
Eugene Byrne
Zachary Chavis
Antonino Furnari
Rohit Girdhar
Jackson Hamburger
Hao Jiang
Miao Liu
Xingyu Liu
Miguel Martin
Tushar Nagarajan
Ilija Radosavovic
Santhosh Kumar Ramakrishnan
Fiona Ryan
J. Sharma
Michael Wray
Mengmeng Xu
Eric Z. Xu
Chen Zhao
Siddhant Bansal
Dhruv Batra
Vincent Cartillier
Sean Crane
Tien Do
Morrie Doulaty
Akshay Erapalli
Christoph Feichtenhofer
A. Fragomeni
Qichen Fu
A. Gebreselasie
Cristina González
James M. Hillis
Xuhua Huang
Yifei Huang
Wenqi Jia
Weslie Khoo
J. Kolár
Satwik Kottur
Anurag Kumar
F. Landini
Chao Li
Yanghao Li
Zhenqiang Li
K. Mangalam
Raghava Modhugu
Jonathan Munro
Tullie Murrell
Takumi Nishiyasu
Will Price
Paola Ruiz Puentes
Merey Ramazanova
Leda Sari
Kiran Somasundaram
Audrey Southerland
Yusuke Sugano
Ruijie Tao
Minh Vo
Yuchen Wang
Xindi Wu
Takuma Yagi
Ziwei Zhao
Yunyi Zhu
Pablo Arbelaez
David J. Crandall
Dima Damen
G. Farinella
Christian Fuegen
Guohao Li
V. Ithapu
C. V. Jawahar
Hanbyul Joo
Kris Kitani
Haizhou Li
Richard Newcombe
A. Oliva
H. Park
James M. Rehg
Yoichi Sato
Jianbo Shi
Mike Zheng Shou
Antonio Torralba
Lorenzo Torresani
Mingfei Yan
Jitendra Malik
    EgoV
ArXivPDFHTML

Papers citing "Ego4D: Around the World in 3,000 Hours of Egocentric Video"

50 / 791 papers shown
Title
Robot Fleet Learning via Policy Merging
Robot Fleet Learning via Policy Merging
Lirui Wang
Kaiqing Zhang
Allan Zhou
Max Simchowitz
Russ Tedrake
46
4
0
02 Oct 2023
GRID: A Platform for General Robot Intelligence Development
GRID: A Platform for General Robot Intelligence Development
Sai H. Vemprala
Shuhang Chen
Abhinav Shukla
Dinesh Narayanan
Ashish Kapoor
25
10
0
02 Oct 2023
Region-centric Image-Language Pretraining for Open-Vocabulary Detection
Region-centric Image-Language Pretraining for Open-Vocabulary Detection
Dahun Kim
A. Angelova
Weicheng Kuo
ObjD
VLM
17
3
0
29 Sep 2023
Telling Stories for Common Sense Zero-Shot Action Recognition
Telling Stories for Common Sense Zero-Shot Action Recognition
Shreyank N. Gowda
Carolina Scarton
LM&Ro
30
2
0
29 Sep 2023
A Survey on Deep Learning Techniques for Action Anticipation
A Survey on Deep Learning Techniques for Action Anticipation
Zeyun Zhong
Manuel Martin
Michael Voit
Juergen Gall
Jürgen Beyerer
26
7
0
29 Sep 2023
HoloAssist: an Egocentric Human Interaction Dataset for Interactive AI
  Assistants in the Real World
HoloAssist: an Egocentric Human Interaction Dataset for Interactive AI Assistants in the Real World
Linghao Yang
Taein Kwon
Mahdi Rad
Bowen Pan
Ishani Chakraborty
...
Ashley Feniello
Rui Tian
Felipe Vieira Frujeri
Neel Joshi
Marc Pollefeys
EgoV
38
49
0
29 Sep 2023
Training a Large Video Model on a Single Machine in a Day
Training a Large Video Model on a Single Machine in a Day
Yue Zhao
Philipp Krahenbuhl
VLM
34
15
0
28 Sep 2023
Audio Visual Speaker Localization from EgoCentric Views
Audio Visual Speaker Localization from EgoCentric Views
Jinzheng Zhao
Yong-mei Xu
Xinyuan Qian
Wenwu Wang
EgoV
35
5
0
28 Sep 2023
AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model
AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model
Avamarie Brueggeman
Andrea Madotto
Zhaojiang Lin
Tushar Nagarajan
Matt Smith
...
Peyman Heidari
Yue Liu
Kavya Srinet
Babak Damavandi
Anuj Kumar
MLLM
34
93
0
27 Sep 2023
Chop & Learn: Recognizing and Generating Object-State Compositions
Chop & Learn: Recognizing and Generating Object-State Compositions
Nirat Saini
Hanyu Wang
Archana Swaminathan
Vinoj Jayasundara
Bo He
Kamal Gupta
Abhinav Shrivastava
CoGe
30
12
0
25 Sep 2023
Egocentric RGB+Depth Action Recognition in Industry-Like Settings
Egocentric RGB+Depth Action Recognition in Industry-Like Settings
Jyoti Kini
Sarah Fleischer
I. Dave
Mubarak Shah
EgoV
26
2
0
25 Sep 2023
VidChapters-7M: Video Chapters at Scale
VidChapters-7M: Video Chapters at Scale
Antoine Yang
Arsha Nagrani
Ivan Laptev
Josef Sivic
Cordelia Schmid
VGen
23
26
0
25 Sep 2023
Robotic Offline RL from Internet Videos via Value-Function Pre-Training
Robotic Offline RL from Internet Videos via Value-Function Pre-Training
Chethan Bhateja
Derek Guo
Dibya Ghosh
Anika Singh
Manan Tomar
Q. Vuong
Yevgen Chebotar
Sergey Levine
Aviral Kumar
OffRL
38
20
0
22 Sep 2023
AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual
  Representation Models
AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models
Yuan Tseng
Layne Berry
Yi-Ting Chen
I-Hsiang Chiu
Hsuan-Hao Lin
...
Yu Tsao
Shinji Watanabe
Abdel-rahman Mohamed
Chi-Luen Feng
Hung-yi Lee
VLM
SSL
55
14
0
19 Sep 2023
Towards Debiasing Frame Length Bias in Text-Video Retrieval via Causal
  Intervention
Towards Debiasing Frame Length Bias in Text-Video Retrieval via Causal Intervention
Burak Satar
Huaiyu Zhu
Hanwang Zhang
Joo-Hwee Lim
CML
43
0
0
17 Sep 2023
CaSAR: Contact-aware Skeletal Action Recognition
CaSAR: Contact-aware Skeletal Action Recognition
Junan Lin
Zhichao Sun
Enjie Cao
Taein Kwon
Mahdi Rad
Marc Pollefeys
14
1
0
17 Sep 2023
In-Style: Bridging Text and Uncurated Videos with Style Transfer for
  Text-Video Retrieval
In-Style: Bridging Text and Uncurated Videos with Style Transfer for Text-Video Retrieval
Nina Shvetsova
Anna Kukleva
Bernt Schiele
Hilde Kuehne
DiffM
33
3
0
16 Sep 2023
EgoObjects: A Large-Scale Egocentric Dataset for Fine-Grained Object
  Understanding
EgoObjects: A Large-Scale Egocentric Dataset for Fine-Grained Object Understanding
Chenchen Zhu
Fanyi Xiao
Andres Alvarado
Yasmine Babaei
Jiabo Hu
Hichem El-Mohri
Sean Culatana
Roshan Sumbaly
Zhicheng Yan
EgoV
35
19
0
15 Sep 2023
Masked Diffusion with Task-awareness for Procedure Planning in
  Instructional Videos
Masked Diffusion with Task-awareness for Procedure Planning in Instructional Videos
Fen Fang
Yun Liu
Ali Koksal
Qianli Xu
Joo-Hwee Lim
VGen
DiffM
26
5
0
14 Sep 2023
EgoPCA: A New Framework for Egocentric Hand-Object Interaction
  Understanding
EgoPCA: A New Framework for Egocentric Hand-Object Interaction Understanding
Yue Xu
Yong-Lu Li
Zhemin Huang
Michael Xu Liu
Cewu Lu
Yu-Wing Tai
Chi-Keung Tang
EgoV
25
9
0
05 Sep 2023
RoboAgent: Generalization and Efficiency in Robot Manipulation via
  Semantic Augmentations and Action Chunking
RoboAgent: Generalization and Efficiency in Robot Manipulation via Semantic Augmentations and Action Chunking
Homanga Bharadhwaj
Jay Vakil
Mohit Sharma
Abhi Gupta
Shubham Tulsiani
Vikash Kumar
LM&Ro
21
116
0
05 Sep 2023
Language Reward Modulation for Pretraining Reinforcement Learning
Language Reward Modulation for Pretraining Reinforcement Learning
Ademi Adeniji
Amber Xie
Carmelo Sferrazza
Younggyo Seo
Stephen James
Pieter Abbeel
39
26
0
23 Aug 2023
The TYC Dataset for Understanding Instance-Level Semantics and Motions
  of Cells in Microstructures
The TYC Dataset for Understanding Instance-Level Semantics and Motions of Cells in Microstructures
Christoph Reich
Tim Prangemeier
Heinz Koeppl
31
0
0
23 Aug 2023
RefEgo: Referring Expression Comprehension Dataset from First-Person
  Perception of Ego4D
RefEgo: Referring Expression Comprehension Dataset from First-Person Perception of Ego4D
Shuhei Kurita
Naoki Katsura
Eri Onami
EgoV
26
14
0
23 Aug 2023
Learning from Semantic Alignment between Unpaired Multiviews for
  Egocentric Video Recognition
Learning from Semantic Alignment between Unpaired Multiviews for Egocentric Video Recognition
Qitong Wang
Long Zhao
Liangzhe Yuan
Ting Liu
Xi Peng
36
12
0
22 Aug 2023
Opening the Vocabulary of Egocentric Actions
Opening the Vocabulary of Egocentric Actions
Dibyadip Chatterjee
Fadime Sener
Shugao Ma
Angela Yao
VLM
45
16
0
22 Aug 2023
EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language
  Understanding
EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language Understanding
K. Mangalam
Raiymbek Akshulakov
Jitendra Malik
25
253
0
17 Aug 2023
Bridging High-Quality Audio and Video via Language for Sound Effects
  Retrieval from Visual Queries
Bridging High-Quality Audio and Video via Language for Sound Effects Retrieval from Visual Queries
J. Wilkins
Justin Salamon
Magdalena Fuentes
J. P. Bello
Oriol Nieto
CLIP
14
5
0
17 Aug 2023
Chat-3D: Data-efficiently Tuning Large Language Model for Universal
  Dialogue of 3D Scenes
Chat-3D: Data-efficiently Tuning Large Language Model for Universal Dialogue of 3D Scenes
Zehan Wang
Haifeng Huang
Yang Zhao
Ziang Zhang
Zhou Zhao
19
62
0
17 Aug 2023
Leveraging Next-Active Objects for Context-Aware Anticipation in
  Egocentric Videos
Leveraging Next-Active Objects for Context-Aware Anticipation in Egocentric Videos
Sanket Thakur
Cigdem Beyan
Pietro Morerio
Vittorio Murino
Alessio Del Bue
26
11
0
16 Aug 2023
Helping Hands: An Object-Aware Ego-Centric Video Recognition Model
Helping Hands: An Object-Aware Ego-Centric Video Recognition Model
Chuhan Zhang
Ankush Gupta
Andrew Zisserman
VLM
26
20
0
15 Aug 2023
Memory-and-Anticipation Transformer for Online Action Understanding
Memory-and-Anticipation Transformer for Online Action Understanding
Jiahao Wang
Guo Chen
Yifei Huang
Liming Wang
Tong Lu
OffRL
62
37
0
15 Aug 2023
An Outlook into the Future of Egocentric Vision
An Outlook into the Future of Egocentric Vision
Chiara Plizzari
Gabriele Goletto
Antonino Furnari
Siddhant Bansal
Francesco Ragusa
G. Farinella
Dima Damen
Tatiana Tommasi
EgoV
40
38
0
14 Aug 2023
Encode-Store-Retrieve: Enhancing Memory Augmentation through
  Language-Encoded Egocentric Perception
Encode-Store-Retrieve: Enhancing Memory Augmentation through Language-Encoded Egocentric Perception
Junxiao Shen
John J. Dudley
Per Ola Kristensson
RALM
25
0
0
10 Aug 2023
Scaling may be all you need for achieving human-level object recognition
  capacity with human-like visual experience
Scaling may be all you need for achieving human-level object recognition capacity with human-like visual experience
Emin Orhan
26
3
0
07 Aug 2023
MOMA-Force: Visual-Force Imitation for Real-World Mobile Manipulation
MOMA-Force: Visual-Force Imitation for Real-World Mobile Manipulation
Taozheng Yang
Yaxing Jing
Hongtao Wu
Jiafeng Xu
Kuankuan Sima
Guangzeng Chen
Qie Sima
Tao Kong
20
18
0
07 Aug 2023
Exploring Visual Pre-training for Robot Manipulation: Datasets, Models
  and Methods
Exploring Visual Pre-training for Robot Manipulation: Datasets, Models and Methods
Ya Jing
Xuelin Zhu
Xingbin Liu
Qie Sima
Taozheng Yang
Yunhai Feng
Tao Kong
LM&Ro
45
16
0
07 Aug 2023
UniVTG: Towards Unified Video-Language Temporal Grounding
UniVTG: Towards Unified Video-Language Temporal Grounding
Kevin Qinghong Lin
Pengchuan Zhang
Joya Chen
Shraman Pramanick
Difei Gao
Alex Jinpeng Wang
Rui Yan
Mike Zheng Shou
29
113
0
31 Jul 2023
AntGPT: Can Large Language Models Help Long-term Action Anticipation
  from Videos?
AntGPT: Can Large Language Models Help Long-term Action Anticipation from Videos?
Qi Zhao
Shijie Wang
Ce Zhang
Changcheng Fu
Minh Quan Do
Nakul Agarwal
Kwonjoon Lee
Chen Sun
LM&Ro
53
49
0
31 Jul 2023
Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures
Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures
Kun Yuan
V. Srivastav
Tong Yu
Joël L. Lavanchy
Pietro Mascagni
Pietro Mascagni
N. Padoy
Nicolas Padoy
37
20
0
27 Jul 2023
Scaling Up and Distilling Down: Language-Guided Robot Skill Acquisition
Scaling Up and Distilling Down: Language-Guided Robot Skill Acquisition
Huy Ha
Peter R. Florence
Shuran Song
LM&Ro
47
151
0
26 Jul 2023
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Muhammad Awais
Muzammal Naseer
Salman Khan
Rao Muhammad Anwer
Hisham Cholakkal
M. Shah
Ming Yang
Fahad Shahbaz Khan
VLM
38
118
0
25 Jul 2023
RoboChop: Autonomous Framework for Fruit and Vegetable Chopping
  Leveraging Foundational Models
RoboChop: Autonomous Framework for Fruit and Vegetable Chopping Leveraging Foundational Models
Atharva Dikshit
Alison Bartsch
Abraham George
A. Farimani
30
10
0
24 Jul 2023
Multiscale Video Pretraining for Long-Term Activity Forecasting
Multiscale Video Pretraining for Long-Term Activity Forecasting
Reuben Tan
Matthias De Lange
Michael L. Iuzzolino
Bryan A. Plummer
Kate Saenko
Karl Ridgeway
Lorenzo Torresani
AI4TS
27
6
0
24 Jul 2023
Meta-Transformer: A Unified Framework for Multimodal Learning
Meta-Transformer: A Unified Framework for Multimodal Learning
Yiyuan Zhang
Kaixiong Gong
Kaipeng Zhang
Hongsheng Li
Yu Qiao
Wanli Ouyang
Xiangyu Yue
33
137
0
20 Jul 2023
No-frills Temporal Video Grounding: Multi-Scale Neighboring Attention
  and Zoom-in Boundary Detection
No-frills Temporal Video Grounding: Multi-Scale Neighboring Attention and Zoom-in Boundary Detection
Qi Zhang
S. Zheng
Qin Jin
27
1
0
20 Jul 2023
Video-Mined Task Graphs for Keystep Recognition in Instructional Videos
Video-Mined Task Graphs for Keystep Recognition in Instructional Videos
Kumar Ashutosh
Santhosh Kumar Ramakrishnan
Triantafyllos Afouras
Kristen Grauman
29
24
0
17 Jul 2023
Uncertainty-aware State Space Transformer for Egocentric 3D Hand
  Trajectory Forecasting
Uncertainty-aware State Space Transformer for Egocentric 3D Hand Trajectory Forecasting
Wentao Bao
Lele Chen
Libing Zeng
Zhong Li
Yinghao Xu
Junsong Yuan
Yubo Kong
29
15
0
17 Jul 2023
Multimodal Distillation for Egocentric Action Recognition
Multimodal Distillation for Egocentric Action Recognition
Gorjan Radevski
Dusan Grujicic
Marie-Francine Moens
Matthew Blaschko
Tinne Tuytelaars
EgoV
30
23
0
14 Jul 2023
VoxPoser: Composable 3D Value Maps for Robotic Manipulation with
  Language Models
VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models
Wenlong Huang
Chen Wang
Ruohan Zhang
Yunzhu Li
Jiajun Wu
Li Fei-Fei
LM&Ro
37
482
0
12 Jul 2023
Previous
123...101112...141516
Next