ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2110.07058
  4. Cited By
Ego4D: Around the World in 3,000 Hours of Egocentric Video

Ego4D: Around the World in 3,000 Hours of Egocentric Video

13 October 2021
Kristen Grauman
Andrew Westbury
Eugene Byrne
Zachary Chavis
Antonino Furnari
Rohit Girdhar
Jackson Hamburger
Hao Jiang
Miao Liu
Xingyu Liu
Miguel Martin
Tushar Nagarajan
Ilija Radosavovic
Santhosh Kumar Ramakrishnan
Fiona Ryan
J. Sharma
Michael Wray
Mengmeng Xu
Eric Z. Xu
Chen Zhao
Siddhant Bansal
Dhruv Batra
Vincent Cartillier
Sean Crane
Tien Do
Morrie Doulaty
Akshay Erapalli
Christoph Feichtenhofer
A. Fragomeni
Qichen Fu
A. Gebreselasie
Cristina González
James M. Hillis
Xuhua Huang
Yifei Huang
Wenqi Jia
Weslie Khoo
J. Kolár
Satwik Kottur
Anurag Kumar
F. Landini
Chao Li
Yanghao Li
Zhenqiang Li
K. Mangalam
Raghava Modhugu
Jonathan Munro
Tullie Murrell
Takumi Nishiyasu
Will Price
Paola Ruiz Puentes
Merey Ramazanova
Leda Sari
Kiran Somasundaram
Audrey Southerland
Yusuke Sugano
Ruijie Tao
Minh Vo
Yuchen Wang
Xindi Wu
Takuma Yagi
Ziwei Zhao
Yunyi Zhu
Pablo Arbelaez
David J. Crandall
Dima Damen
G. Farinella
Christian Fuegen
Guohao Li
V. Ithapu
C. V. Jawahar
Hanbyul Joo
Kris Kitani
Haizhou Li
Richard Newcombe
A. Oliva
H. Park
James M. Rehg
Yoichi Sato
Jianbo Shi
Mike Zheng Shou
Antonio Torralba
Lorenzo Torresani
Mingfei Yan
Jitendra Malik
    EgoV
ArXivPDFHTML

Papers citing "Ego4D: Around the World in 3,000 Hours of Egocentric Video"

50 / 791 papers shown
Title
Spacewalk-18: A Benchmark for Multimodal and Long-form Procedural Video Understanding in Novel Domains
Spacewalk-18: A Benchmark for Multimodal and Long-form Procedural Video Understanding in Novel Domains
Rohan Myer Krishnan
Zitian Tang
Zhiqiu Yu
Chen Sun
59
1
0
30 Nov 2023
Look Before You Leap: Unveiling the Power of GPT-4V in Robotic
  Vision-Language Planning
Look Before You Leap: Unveiling the Power of GPT-4V in Robotic Vision-Language Planning
Yingdong Hu
Fanqi Lin
Tong Zhang
Li Yi
Yang Gao
LM&Ro
91
102
0
29 Nov 2023
PALM: Predicting Actions through Language Models
PALM: Predicting Actions through Language Models
Sanghwan Kim
Daoji Huang
Yongqin Xian
Otmar Hilliges
Luc Van Gool
Xi Wang
VLM
24
10
0
29 Nov 2023
End-to-End Temporal Action Detection with 1B Parameters Across 1000
  Frames
End-to-End Temporal Action Detection with 1B Parameters Across 1000 Frames
Shuming Liu
Chen-Da Liu-Zhang
Chen Zhao
Guohao Li
38
25
0
28 Nov 2023
Panoptic Video Scene Graph Generation
Panoptic Video Scene Graph Generation
Jingkang Yang
Wen-Hsiao Peng
Xiangtai Li
Zujin Guo
Liangyu Chen
...
Zheng Ma
Kaiyang Zhou
Wayne Zhang
Chen Change Loy
Ziwei Liu
VOS
60
42
0
28 Nov 2023
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
Kunchang Li
Yali Wang
Yinan He
Yizhuo Li
Yi Wang
...
Jilan Xu
Guo Chen
Ping Luo
Limin Wang
Yu Qiao
VLM
MLLM
82
410
0
28 Nov 2023
Centre Stage: Centricity-based Audio-Visual Temporal Action Detection
Centre Stage: Centricity-based Audio-Visual Temporal Action Detection
Hanyuan Wang
Majid Mirmehdi
Dima Damen
Toby Perrett
52
2
0
28 Nov 2023
Exo2EgoDVC: Dense Video Captioning of Egocentric Procedural Activities
  Using Web Instructional Videos
Exo2EgoDVC: Dense Video Captioning of Egocentric Procedural Activities Using Web Instructional Videos
Takehiko Ohkawa
Takuma Yagi
Taichi Nishimura
Ryosuke Furuta
Atsushi Hashimoto
Yoshitaka Ushiku
Yoichi Sato
EgoV
55
8
0
28 Nov 2023
On Bringing Robots Home
On Bringing Robots Home
Nur Muhammad (Mahi) Shafiullah
Anant Rai
Haritheja Etukuru
Yiqian Liu
Ishan Misra
Soumith Chintala
Lerrel Pinto
33
77
0
27 Nov 2023
DiffAnt: Diffusion Models for Action Anticipation
DiffAnt: Diffusion Models for Action Anticipation
Zeyun Zhong
Chengzhi Wu
Manuel Martin
Michael Voit
Juergen Gall
Jürgen Beyerer
DiffM
VGen
20
6
0
27 Nov 2023
Temporal Action Localization for Inertial-based Human Activity
  Recognition
Temporal Action Localization for Inertial-based Human Activity Recognition
Marius Bock
Michael Moeller
Kristof Van Laerhoven
30
0
0
27 Nov 2023
EgoThink: Evaluating First-Person Perspective Thinking Capability of
  Vision-Language Models
EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Language Models
Sijie Cheng
Zhicheng Guo
Jingwen Wu
Kechen Fang
Peng Li
Huaping Liu
Yang Liu
EgoV
LRM
36
16
0
27 Nov 2023
AutoEval-Video: An Automatic Benchmark for Assessing Large Vision
  Language Models in Open-Ended Video Question Answering
AutoEval-Video: An Automatic Benchmark for Assessing Large Vision Language Models in Open-Ended Video Question Answering
Xiuyuan Chen
Yuan Lin
Yuchen Zhang
Weiran Huang
ELM
MLLM
31
26
0
25 Nov 2023
XAGen: 3D Expressive Human Avatars Generation
XAGen: 3D Expressive Human Avatars Generation
Zhongcong Xu
Jianfeng Zhang
Jun Hao Liew
Jiashi Feng
Mike Zheng Shou
31
14
0
22 Nov 2023
Vamos: Versatile Action Models for Video Understanding
Vamos: Versatile Action Models for Video Understanding
Shijie Wang
Qi Zhao
Minh Quan Do
Nakul Agarwal
Kwonjoon Lee
Chen Sun
29
19
0
22 Nov 2023
Correlation-Guided Query-Dependency Calibration for Video Temporal
  Grounding
Correlation-Guided Query-Dependency Calibration for Video Temporal Grounding
WonJun Moon
Sangeek Hyun
Subeen Lee
Jae-Pil Heo
32
4
0
15 Nov 2023
Generalizable Imitation Learning Through Pre-Trained Representations
Generalizable Imitation Learning Through Pre-Trained Representations
Wei-Di Chang
F. Hogan
D. Meger
Gregory Dudek
Gregory Dudek
41
1
0
15 Nov 2023
Offline Data Enhanced On-Policy Policy Gradient with Provable Guarantees
Offline Data Enhanced On-Policy Policy Gradient with Provable Guarantees
Yifei Zhou
Ayush Sekhari
Yuda Song
Wen Sun
OffRL
OnRL
30
8
0
14 Nov 2023
Aria-NeRF: Multimodal Egocentric View Synthesis
Aria-NeRF: Multimodal Egocentric View Synthesis
Jiankai Sun
Jianing Qiu
Chuanyang Zheng
Johnathan Tucker
Javier Yu
Mac Schwager
EgoV
40
5
0
11 Nov 2023
MultiIoT: Benchmarking Machine Learning for the Internet of Things
MultiIoT: Benchmarking Machine Learning for the Internet of Things
Shentong Mo
Louis-Philippe Morency
Russ Salakhutdinov
Paul Pu Liang
30
1
0
10 Nov 2023
OtterHD: A High-Resolution Multi-modality Model
OtterHD: A High-Resolution Multi-modality Model
Bo-wen Li
Peiyuan Zhang
Jingkang Yang
Yuanhan Zhang
Fanyi Pu
Ziwei Liu
VLM
MLLM
43
65
0
07 Nov 2023
On Hand-Held Grippers and the Morphological Gap in Human Manipulation
  Demonstration
On Hand-Held Grippers and the Morphological Gap in Human Manipulation Demonstration
Kiran Doshi
Yijiang Huang
Stelian Coros
27
6
0
03 Nov 2023
ACQUIRED: A Dataset for Answering Counterfactual Questions In Real-Life
  Videos
ACQUIRED: A Dataset for Answering Counterfactual Questions In Real-Life Videos
Te-Lin Wu
Zi-Yi Dou
Qingyuan Hu
Yu Hou
Nischal Reddy Chandra
Marjorie Freedman
R. Weischedel
Nanyun Peng
39
5
0
02 Nov 2023
The Power of the Senses: Generalizable Manipulation from Vision and
  Touch through Masked Multimodal Learning
The Power of the Senses: Generalizable Manipulation from Vision and Touch through Masked Multimodal Learning
Carmelo Sferrazza
Younggyo Seo
Hao Liu
Youngwoon Lee
Pieter Abbeel
46
15
0
02 Nov 2023
RoboVQA: Multimodal Long-Horizon Reasoning for Robotics
RoboVQA: Multimodal Long-Horizon Reasoning for Robotics
P. Sermanet
Tianli Ding
Jeffrey Zhao
Fei Xia
Debidatta Dwibedi
...
Pannag R. Sanketi
Karol Hausman
Izhak Shafran
Brian Ichter
Yuan Cao
LM&Ro
33
50
0
01 Nov 2023
ProBio: A Protocol-guided Multimodal Dataset for Molecular Biology Lab
ProBio: A Protocol-guided Multimodal Dataset for Molecular Biology Lab
Jieming Cui
Ziren Gong
Baoxiong Jia
Siyuan Huang
Zilong Zheng
Jianzhu Ma
Yixin Zhu
42
3
0
01 Nov 2023
Object-centric Video Representation for Long-term Action Anticipation
Object-centric Video Representation for Long-term Action Anticipation
Ce Zhang
Changcheng Fu
Shijie Wang
Nakul Agarwal
Kwonjoon Lee
Chiho Choi
Chen Sun
42
14
0
31 Oct 2023
StairNet: Visual Recognition of Stairs for Human-Robot Locomotion
StairNet: Visual Recognition of Stairs for Human-Robot Locomotion
Andrew Garrett Kurbis
Dmytro Kuzmenko
Bogdan Ivanyuk-Skulskiy
Alex Mihailidis
Brokoslaw Laschowski
23
0
0
31 Oct 2023
MM-VID: Advancing Video Understanding with GPT-4V(ision)
MM-VID: Advancing Video Understanding with GPT-4V(ision)
Kevin Qinghong Lin
Faisal Ahmed
Linjie Li
Chung-Ching Lin
E. Azarnasab
...
Lin Liang
Zicheng Liu
Yumao Lu
Ce Liu
Lijuan Wang
MLLM
28
63
0
30 Oct 2023
A Dataset of Relighted 3D Interacting Hands
A Dataset of Relighted 3D Interacting Hands
Gyeongsik Moon
Shunsuke Saito
Weipeng Xu
Rohan P. Joshi
Julia Buffalini
...
Tomas Simon
Bo Peng
Shubham Garg
Kevyn McPhail
Takaaki Shiratori
47
27
0
26 Oct 2023
Audio-Visual Speaker Tracking: Progress, Challenges, and Future Directions
Audio-Visual Speaker Tracking: Progress, Challenges, and Future Directions
Jinzheng Zhao
Yong-mei Xu
Xinyuan Qian
Davide Berghi
Peipei Wu
Meng Cui
Jianyuan Sun
Philip J. B. Jackson
Wenwu Wang
BDL
45
7
0
23 Oct 2023
Powerset multi-class cross entropy loss for neural speaker diarization
Powerset multi-class cross entropy loss for neural speaker diarization
Alexis Plaquet
H. Bredin
109
91
0
19 Oct 2023
MISAR: A Multimodal Instructional System with Augmented Reality
MISAR: A Multimodal Instructional System with Augmented Reality
Jing Bi
Nguyen Nguyen
A. Vosoughi
Chenliang Xu
54
11
0
18 Oct 2023
A Survey on Video Diffusion Models
A Survey on Video Diffusion Models
Zhen Xing
Qijun Feng
Haoran Chen
Qi Dai
Hang-Rui Hu
Hang Xu
Zuxuan Wu
Yu-Gang Jiang
EGVM
VGen
59
117
0
16 Oct 2023
Zero-Shot Robotic Manipulation with Pretrained Image-Editing Diffusion
  Models
Zero-Shot Robotic Manipulation with Pretrained Image-Editing Diffusion Models
Kevin Black
Mitsuhiko Nakamoto
P. Atreya
Homer Walke
Chelsea Finn
Aviral Kumar
Sergey Levine
DiffM
LM&Ro
35
132
0
16 Oct 2023
Video Language Planning
Video Language Planning
Yilun Du
Mengjiao Yang
Peter R. Florence
Fei Xia
Ayzaan Wahid
...
Pieter Abbeel
Josh Tenenbaum
L. Kaelbling
Andy Zeng
Jonathan Tompson
PINN
LM&Ro
96
86
0
16 Oct 2023
Evaluating Robustness of Visual Representations for Object Assembly Task
  Requiring Spatio-Geometrical Reasoning
Evaluating Robustness of Visual Representations for Object Assembly Task Requiring Spatio-Geometrical Reasoning
Chahyon Ku
Carl Winge
Ryan Diaz
Wentao Yuan
Karthik Desingh
29
3
0
15 Oct 2023
An Unbiased Look at Datasets for Visuo-Motor Pre-Training
An Unbiased Look at Datasets for Visuo-Motor Pre-Training
Sudeep Dasari
Mohan Kumar Srirama
Unnat Jain
Abhinav Gupta
SSL
34
36
0
13 Oct 2023
Is ImageNet worth 1 video? Learning strong image encoders from 1 long
  unlabelled video
Is ImageNet worth 1 video? Learning strong image encoders from 1 long unlabelled video
Shashanka Venkataramanan
Mamshad Nayeem Rizve
João Carreira
Yuki M. Asano
Yannis Avrithis
SSL
37
18
0
12 Oct 2023
Universal Visual Decomposer: Long-Horizon Manipulation Made Easy
Universal Visual Decomposer: Long-Horizon Manipulation Made Easy
Zichen Zhang
Yunshuang Li
Osbert Bastani
Abhishek Gupta
Dinesh Jayaraman
Yecheng Jason Ma
Luca Weihs
37
17
0
12 Oct 2023
What Matters to You? Towards Visual Representation Alignment for Robot
  Learning
What Matters to You? Towards Visual Representation Alignment for Robot Learning
Ran Tian
Chenfeng Xu
Masayoshi Tomizuka
Jitendra Malik
Andrea V. Bajcsy
24
9
0
11 Oct 2023
RoboCLIP: One Demonstration is Enough to Learn Robot Policies
RoboCLIP: One Demonstration is Enough to Learn Robot Policies
Sumedh Anand Sontakke
Jesse Zhang
Sébastien M. R. Arnold
Karl Pertsch
Erdem Biyik
Dorsa Sadigh
Chelsea Finn
Laurent Itti
OffRL
27
66
0
11 Oct 2023
Learning Interactive Real-World Simulators
Learning Interactive Real-World Simulators
Mengjiao Yang
Yilun Du
Kamyar Ghasemipour
Jonathan Tompson
Leslie Kaelbling
Dale Schuurmans
Pieter Abbeel
LM&Ro
PINN
30
180
0
09 Oct 2023
TAIL: Task-specific Adapters for Imitation Learning with Large
  Pretrained Models
TAIL: Task-specific Adapters for Imitation Learning with Large Pretrained Models
Zuxin Liu
Jesse Zhang
Kavosh Asadi
Yao Liu
Ding Zhao
Shoham Sabach
Rasool Fakoor
ALM
AI4CE
23
26
0
09 Oct 2023
Fine-grained Audio-Visual Joint Representations for Multimodal Large
  Language Models
Fine-grained Audio-Visual Joint Representations for Multimodal Large Language Models
Guangzhi Sun
Wenyi Yu
Changli Tang
Xianzhao Chen
Tian Tan
Wei Li
Lu Lu
Zejun Ma
Chao Zhang
38
12
0
09 Oct 2023
Graph learning in robotics: a survey
Graph learning in robotics: a survey
Francesca Pistilli
Giuseppe Averta
AI4CE
GNN
32
7
0
06 Oct 2023
Human-oriented Representation Learning for Robotic Manipulation
Human-oriented Representation Learning for Robotic Manipulation
Mingxiao Huo
Mingyu Ding
Chenfeng Xu
Thomas Tian
Xinghao Zhu
Yao Mu
Lingfeng Sun
Masayoshi Tomizuka
Wei Zhan
SSL
50
12
0
04 Oct 2023
Discriminative Training of VBx Diarization
Discriminative Training of VBx Diarization
Dominik Klement
Mireia Díez
Federico Landini
Lukávs Burget
Anna Silnova
Marc Delcroix
Naohiro Tawara
48
2
0
04 Oct 2023
LanguageBind: Extending Video-Language Pretraining to N-modality by
  Language-based Semantic Alignment
LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
Bin Zhu
Bin Lin
Munan Ning
Yang Yan
Jiaxi Cui
...
Zongwei Li
Wancai Zhang
Zhifeng Li
Wei Liu
Liejie Yuan
VLM
MLLM
32
205
0
03 Oct 2023
H-InDex: Visual Reinforcement Learning with Hand-Informed
  Representations for Dexterous Manipulation
H-InDex: Visual Reinforcement Learning with Hand-Informed Representations for Dexterous Manipulation
Yanjie Ze
Yuyao Liu
Ruizhe Shi
Jiaxin Qin
Zhecheng Yuan
Jiashun Wang
Huazhe Xu
34
1
0
02 Oct 2023
Previous
123...91011...141516
Next