Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2206.01670
Cited By
Egocentric Video-Language Pretraining
3 June 2022
Kevin Qinghong Lin
Alex Jinpeng Wang
Mattia Soldan
Michael Wray
Rui Yan
Eric Z. Xu
Difei Gao
Rong-Cheng Tu
Wenzhe Zhao
Weijie Kong
Chengfei Cai
Hongfa Wang
Dima Damen
Bernard Ghanem
Wei Liu
Mike Zheng Shou
VLM
EgoV
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Egocentric Video-Language Pretraining"
50 / 159 papers shown
Title
Learning from Semantic Alignment between Unpaired Multiviews for Egocentric Video Recognition
Qitong Wang
Long Zhao
Liangzhe Yuan
Ting Liu
Xi Peng
28
12
0
22 Aug 2023
Opening the Vocabulary of Egocentric Actions
Dibyadip Chatterjee
Fadime Sener
Shugao Ma
Angela Yao
VLM
36
16
0
22 Aug 2023
ViT-Lens: Initiating Omni-Modal Exploration through 3D Insights
Weixian Lei
Yixiao Ge
Jianfeng Zhang
Dylan Sun
Kun Yi
Ying Shan
Mike Zheng Shou
25
1
0
20 Aug 2023
Helping Hands: An Object-Aware Ego-Centric Video Recognition Model
Chuhan Zhang
Ankush Gupta
Andrew Zisserman
VLM
26
19
0
15 Aug 2023
Memory-and-Anticipation Transformer for Online Action Understanding
Jiahao Wang
Guo Chen
Yifei Huang
Liming Wang
Tong Lu
OffRL
54
37
0
15 Aug 2023
ARGUS: Visualization of AI-Assisted Task Guidance in AR
Sonia Castelo
Joao Rulff
Erin McGowan
Bea Steers
Guande Wu
...
Qinghong Sun
Huy Q. Vo
J. P. Bello
M. Krone
Claudio Silva
29
18
0
11 Aug 2023
UniVTG: Towards Unified Video-Language Temporal Grounding
Kevin Qinghong Lin
Pengchuan Zhang
Joya Chen
Shraman Pramanick
Difei Gao
Alex Jinpeng Wang
Rui Yan
Mike Zheng Shou
21
112
0
31 Jul 2023
Video-Mined Task Graphs for Keystep Recognition in Instructional Videos
Kumar Ashutosh
Santhosh Kumar Ramakrishnan
Triantafyllos Afouras
Kristen Grauman
21
23
0
17 Jul 2023
EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone
Shraman Pramanick
Yale Song
Sayan Nag
Kevin Qinghong Lin
Hardik Shah
Mike Zheng Shou
Ramalingam Chellappa
Pengchuan Zhang
VLM
39
86
0
11 Jul 2023
NMS Threshold matters for Ego4D Moment Queries -- 2nd place solution to the Ego4D Moment Queries Challenge 2023
Lin Sui
Fangzhou Mu
Yin Li
20
3
0
05 Jul 2023
Palm: Predicting Actions through Language Models @ Ego4D Long-Term Action Anticipation Challenge 2023
Daoji Huang
Otmar Hilliges
Luc Van Gool
Xi Wang
LRM
25
13
0
28 Jun 2023
SpotEM: Efficient Video Search for Episodic Memory
Santhosh Kumar Ramakrishnan
Ziad Al-Halah
Kristen Grauman
VLM
28
9
0
28 Jun 2023
First Place Solution to the CVPR'2023 AQTC Challenge: A Function-Interaction Centric Approach with Spatiotemporal Visual-Language Alignment
Tom Tongjia Chen
Hongshan Yu
Zhengeng Yang
Ming Li
Zechuan Li
Jingwen Wang
Wei Miao
Wei Sun
Chen Chen
22
2
0
23 Jun 2023
Action Sensitivity Learning for the Ego4D Episodic Memory Challenge 2023
Jiayi Shao
Xiaohan Wang
Ruijie Quan
Yezhou Yang
EgoV
19
8
0
15 Jun 2023
What can a cook in Italy teach a mechanic in India? Action Recognition Generalisation Over Scenarios and Locations
Chiara Plizzari
Toby Perrett
Barbara Caputo
Dima Damen
EgoV
13
16
0
14 Jun 2023
AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn
Difei Gao
Lei Ji
Luowei Zhou
Kevin Lin
Joya Chen
Zihan Fan
Mike Zheng Shou
MLLM
27
71
0
14 Jun 2023
Global and Local Semantic Completion Learning for Vision-Language Pre-training
Rong-Cheng Tu
Yatai Ji
Jie Jiang
Weijie Kong
Chengfei Cai
Wenzhe Zhao
Hongfa Wang
Yujiu Yang
Wei Liu
VLM
24
2
0
12 Jun 2023
Learning Fine-grained View-Invariant Representations from Unpaired Ego-Exo Videos via Temporal Alignment
Zihui Xue
Kristen Grauman
EgoV
31
30
0
08 Jun 2023
An Overview of Challenges in Egocentric Text-Video Retrieval
Burak Satar
Huaiyu Zhu
Hanwang Zhang
J. Lim
EgoV
32
1
0
07 Jun 2023
Too Large; Data Reduction for Vision-Language Pre-Training
Alex Jinpeng Wang
Kevin Qinghong Lin
David Junhao Zhang
Stan Weixian Lei
Mike Zheng Shou
VLM
28
24
0
31 May 2023
Discovering Novel Actions from Open World Egocentric Videos with Object-Grounded Visual Commonsense Reasoning
Sanjoy Kundu
Shubham Trehan
Sathyanarayanan N. Aakur
LRM
LM&Ro
19
1
0
26 May 2023
Action Sensitivity Learning for Temporal Action Localization
Jiayi Shao
Xiaohan Wang
Ruijie Quan
Junjun Zheng
Jiang Yang
Yezhou Yang
23
22
0
25 May 2023
Cross-view Action Recognition Understanding From Exocentric to Egocentric Perspective
Thanh-Dat Truong
Khoa Luu
EgoV
27
10
0
25 May 2023
EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought
Yao Mu
Qinglong Zhang
Mengkang Hu
Wen Wang
Mingyu Ding
Jun Jin
Bin Wang
Jifeng Dai
Yu Qiao
Ping Luo
LM&Ro
LRM
23
219
0
24 May 2023
VideoLLM: Modeling Video Sequence with Large Language Models
Guo Chen
Yin-Dong Zheng
Jiahao Wang
Jilan Xu
Yifei Huang
...
Yi Wang
Yali Wang
Yu Qiao
Tong Lu
Limin Wang
MLLM
92
76
0
22 May 2023
Paxion: Patching Action Knowledge in Video-Language Foundation Models
Zhenhailong Wang
Ansel Blume
Sha Li
Genglin Liu
Jaemin Cho
Zineng Tang
Mohit Bansal
Heng Ji
KELM
VGen
17
26
0
18 May 2023
Listen to Look into the Future: Audio-Visual Egocentric Gaze Anticipation
Bolin Lai
Fiona Ryan
Wenqi Jia
Miao Liu
James M. Rehg
EgoV
19
8
0
06 May 2023
Boundary-Denoising for Video Activity Localization
Mengmeng Xu
Mattia Soldan
Jialin Gao
Shuming Liu
Juan-Manuel Perez-Rua
Bernard Ghanem
19
10
0
06 Apr 2023
VicTR: Video-conditioned Text Representations for Activity Recognition
Kumara Kahatapitiya
Anurag Arnab
Arsha Nagrani
Michael S. Ryoo
29
19
0
05 Apr 2023
Procedure-Aware Pretraining for Instructional Video Understanding
Honglu Zhou
Roberto Martín-Martín
Mubbasir Kapadia
Silvio Savarese
Juan Carlos Niebles
23
38
0
31 Mar 2023
Affordance Grounding from Demonstration Video to Target Image
Joya Chen
Difei Gao
Kevin Qinghong Lin
Mike Zheng Shou
19
24
0
26 Mar 2023
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
Antoine Yang
Arsha Nagrani
Paul Hongsuck Seo
Antoine Miech
Jordi Pont-Tuset
Ivan Laptev
Josef Sivic
Cordelia Schmid
AI4TS
VLM
23
220
0
27 Feb 2023
Localizing Moments in Long Video Via Multimodal Guidance
Wayner Barrios
Mattia Soldan
Alberto M. Ceballos-Arroyo
Fabian Caba Heilbron
Bernard Ghanem
24
20
0
26 Feb 2023
MINOTAUR: Multi-task Video Grounding From Multimodal Queries
Raghav Goyal
E. Mavroudi
Xitong Yang
Sainbayar Sukhbaatar
Leonid Sigal
Matt Feiszli
Lorenzo Torresani
Du Tran
12
7
0
16 Feb 2023
HierVL: Learning Hierarchical Video-Language Embeddings
Kumar Ashutosh
Rohit Girdhar
Lorenzo Torresani
Kristen Grauman
VLM
AI4TS
20
51
0
05 Jan 2023
What You Say Is What You Show: Visual Narration Detection in Instructional Videos
Kumar Ashutosh
Rohit Girdhar
Lorenzo Torresani
Kristen Grauman
18
4
0
05 Jan 2023
Test of Time: Instilling Video-Language Models with a Sense of Time
Piyush Bagad
Makarand Tapaswi
Cees G. M. Snoek
78
36
0
05 Jan 2023
Ego-Only: Egocentric Action Detection without Exocentric Transferring
Huiyu Wang
Mitesh Singh
Lorenzo Torresani
EgoV
69
22
0
03 Jan 2023
STEPs: Self-Supervised Key Step Extraction and Localization from Unlabeled Procedural Videos
Anshul B. Shah
Benjamin Lundell
H. Sawhney
Ramalingam Chellappa
SSL
16
8
0
02 Jan 2023
NaQ: Leveraging Narrations as Queries to Supervise Episodic Memory
Santhosh Kumar Ramakrishnan
Ziad Al-Halah
Kristen Grauman
111
39
0
02 Jan 2023
Egocentric Video Task Translation
Zihui Xue
Yale Song
Kristen Grauman
Lorenzo Torresani
EgoV
21
13
0
13 Dec 2022
Learning Video Representations from Large Language Models
Yue Zhao
Ishan Misra
Philipp Krahenbuhl
Rohit Girdhar
VLM
AI4TS
20
164
0
08 Dec 2022
PromptonomyViT: Multi-Task Prompt Learning Improves Video Transformers using Synthetic Scene Data
Roei Herzig
Ofir Abramovich
Elad Ben-Avraham
Assaf Arbelle
Leonid Karlinsky
Ariel Shamir
Trevor Darrell
Amir Globerson
34
16
0
08 Dec 2022
Multi-Task Learning of Object State Changes from Uncurated Videos
Tomávs Souvcek
Jean-Baptiste Alayrac
Antoine Miech
Ivan Laptev
Josef Sivic
26
11
0
24 Nov 2022
Seeing What You Miss: Vision-Language Pre-training with Semantic Completion Learning
Yatai Ji
Rong-Cheng Tu
Jie Jiang
Weijie Kong
Chengfei Cai
Wenzhe Zhao
Hongfa Wang
Yujiu Yang
Wei Liu
VLM
32
13
0
24 Nov 2022
EVEREST: Efficient Masked Video Autoencoder by Removing Redundant Spatiotemporal Tokens
Sun-Kyoo Hwang
Jaehong Yoon
Youngwan Lee
S. Hwang
21
6
0
19 Nov 2022
InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D Challenges
Guo Chen
Sen Xing
Zhe Chen
Yi Wang
Kunchang Li
...
Hongjie Zhang
Tong Lu
Yali Wang
Liming Wang
Yu Qiao
33
46
0
17 Nov 2022
Where a Strong Backbone Meets Strong Features -- ActionFormer for Ego4D Moment Queries Challenge
Fangzhou Mu
Sicheng Mo
Gillian Wang
Yin Li
22
3
0
16 Nov 2022
An Efficient COarse-to-fiNE Alignment Framework @ Ego4D Natural Language Queries Challenge 2022
Zhijian Hou
Wanjun Zhong
Lei Ji
Difei Gao
Kun Yan
W. Chan
Chong-Wah Ngo
Zheng Shou
Nan Duan
6
6
0
16 Nov 2022
A Simple Transformer-Based Model for Ego4D Natural Language Queries Challenge
Sicheng Mo
Fangzhou Mu
Yin Li
20
7
0
16 Nov 2022
Previous
1
2
3
4
Next