ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2206.01670
  4. Cited By
Egocentric Video-Language Pretraining

Egocentric Video-Language Pretraining

3 June 2022
Kevin Qinghong Lin
Alex Jinpeng Wang
Mattia Soldan
Michael Wray
Rui Yan
Eric Z. Xu
Difei Gao
Rong-Cheng Tu
Wenzhe Zhao
Weijie Kong
Chengfei Cai
Hongfa Wang
Dima Damen
Bernard Ghanem
Wei Liu
Mike Zheng Shou
    VLM
    EgoV
ArXivPDFHTML

Papers citing "Egocentric Video-Language Pretraining"

50 / 159 papers shown
Title
VideoLLM-online: Online Video Large Language Model for Streaming Video
VideoLLM-online: Online Video Large Language Model for Streaming Video
Joya Chen
Zhaoyang Lv
Shiwei Wu
Kevin Qinghong Lin
Chenan Song
Difei Gao
Jia-Wei Liu
Ziteng Gao
Dongxing Mao
Mike Zheng Shou
MLLM
MoMe
40
48
0
17 Jun 2024
EchoGuide: Active Acoustic Guidance for LLM-Based Eating Event Analysis
  from Egocentric Videos
EchoGuide: Active Acoustic Guidance for LLM-Based Eating Event Analysis from Egocentric Videos
Vineet Parikh
Saif Mahmud
Devansh Agarwal
Ke Li
François Guimbretière
Cheng Zhang
21
3
0
15 Jun 2024
Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric
  Videos
Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos
Changan Chen
Puyuan Peng
Ami Baid
Zihui Xue
Wei-Ning Hsu
David F. Harwath
Kristen Grauman
VGen
39
7
0
13 Jun 2024
EgoExo-Fitness: Towards Egocentric and Exocentric Full-Body Action
  Understanding
EgoExo-Fitness: Towards Egocentric and Exocentric Full-Body Action Understanding
Yuan-Ming Li
Wei-Jin Huang
An-Lan Wang
Ling-an Zeng
Jing-Ke Meng
Wei-Shi Zheng
35
11
0
13 Jun 2024
ALGO: Object-Grounded Visual Commonsense Reasoning for Open-World
  Egocentric Action Recognition
ALGO: Object-Grounded Visual Commonsense Reasoning for Open-World Egocentric Action Recognition
Sanjoy Kundu
Shubham Trehan
Sathyanarayanan N. Aakur
LM&Ro
LRM
35
0
0
09 Jun 2024
FILS: Self-Supervised Video Feature Prediction In Semantic Language
  Space
FILS: Self-Supervised Video Feature Prediction In Semantic Language Space
Mona Ahmadian
Frank Guerin
Andrew Gilbert
44
1
0
05 Jun 2024
Contrastive Language Video Time Pre-training
Contrastive Language Video Time Pre-training
Hengyue Liu
Kyle Min
Hector A. Valdez
Subarna Tripathi
VLM
22
0
0
04 Jun 2024
HENASY: Learning to Assemble Scene-Entities for Egocentric
  Video-Language Model
HENASY: Learning to Assemble Scene-Entities for Egocentric Video-Language Model
Khoa T. Vo
Thinh Phan
Kashu Yamazaki
Minh-Triet Tran
Ngan Le
45
1
0
01 Jun 2024
EgoChoir: Capturing 3D Human-Object Interaction Regions from Egocentric
  Views
EgoChoir: Capturing 3D Human-Object Interaction Regions from Egocentric Views
Yuhang Yang
Wei Zhai
Chengfeng Wang
Chengjun Yu
Yang Cao
Zheng-jun Zha
40
5
0
22 May 2024
Step Differences in Instructional Video
Step Differences in Instructional Video
Tushar Nagarajan
Lorenzo Torresani
VGen
27
5
0
24 Apr 2024
HOI-Ref: Hand-Object Interaction Referral in Egocentric Vision
HOI-Ref: Hand-Object Interaction Referral in Egocentric Vision
Siddhant Bansal
Michael Wray
Dima Damen
38
3
0
15 Apr 2024
SoundingActions: Learning How Actions Sound from Narrated Egocentric
  Videos
SoundingActions: Learning How Actions Sound from Narrated Egocentric Videos
Changan Chen
Kumar Ashutosh
Rohit Girdhar
David F. Harwath
Kristen Grauman
EgoV
SSL
26
6
0
08 Apr 2024
UniMD: Towards Unifying Moment Retrieval and Temporal Action Detection
UniMD: Towards Unifying Moment Retrieval and Temporal Action Detection
Yingsen Zeng
Yujie Zhong
Chengjian Feng
Lin Ma
58
7
0
07 Apr 2024
SnAG: Scalable and Accurate Video Grounding
SnAG: Scalable and Accurate Video Grounding
Fangzhou Mu
Sicheng Mo
Yin Li
39
8
0
02 Apr 2024
VideoDistill: Language-aware Vision Distillation for Video Question
  Answering
VideoDistill: Language-aware Vision Distillation for Video Question Answering
Bo Zou
Chao Yang
Yu Qiao
Chengbin Quan
Youjian Zhao
VGen
42
1
0
01 Apr 2024
X-MIC: Cross-Modal Instance Conditioning for Egocentric Action
  Generalization
X-MIC: Cross-Modal Instance Conditioning for Egocentric Action Generalization
Anna Kukleva
Fadime Sener
Edoardo Remelli
Bugra Tekin
Eric Sauser
Bernt Schiele
Shugao Ma
VLM
EgoV
39
1
0
28 Mar 2024
EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World
EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World
Yifei Huang
Guo Chen
Jilan Xu
Mingfang Zhang
Lijin Yang
...
Hongjie Zhang
Lu Dong
Yali Wang
Limin Wang
Yu Qiao
EgoV
60
36
0
24 Mar 2024
POV: Prompt-Oriented View-Agnostic Learning for Egocentric Hand-Object
  Interaction in the Multi-View World
POV: Prompt-Oriented View-Agnostic Learning for Egocentric Hand-Object Interaction in the Multi-View World
Boshen Xu
Sipeng Zheng
Qin Jin
44
7
0
09 Mar 2024
Tell, Don't Show!: Language Guidance Eases Transfer Across Domains in
  Images and Videos
Tell, Don't Show!: Language Guidance Eases Transfer Across Domains in Images and Videos
Tarun Kalluri
Bodhisattwa Prasad Majumder
Manmohan Chandraker
VLM
29
4
0
08 Mar 2024
Spatiotemporal Predictive Pre-training for Robotic Motor Control
Spatiotemporal Predictive Pre-training for Robotic Motor Control
Jiange Yang
Bei Liu
Jianlong Fu
Bocheng Pan
Gangshan Wu
Limin Wang
40
10
0
08 Mar 2024
A Backpack Full of Skills: Egocentric Video Understanding with Diverse
  Task Perspectives
A Backpack Full of Skills: Egocentric Video Understanding with Diverse Task Perspectives
Simone Alberto Peirone
Francesca Pistilli
A. Alliegro
Giuseppe Averta
EgoV
27
4
0
05 Mar 2024
A SOUND APPROACH: Using Large Language Models to generate audio
  descriptions for egocentric text-audio retrieval
A SOUND APPROACH: Using Large Language Models to generate audio descriptions for egocentric text-audio retrieval
Andreea-Maria Oncescu
João F. Henriques
Andrew Zisserman
Samuel Albanie
A. Sophia Koepke
19
5
0
29 Feb 2024
Video ReCap: Recursive Captioning of Hour-Long Videos
Video ReCap: Recursive Captioning of Hour-Long Videos
Md. Mohaiminul Islam
Ngan Ho
Xitong Yang
Tushar Nagarajan
Lorenzo Torresani
Gedas Bertasius
VGen
VLM
29
44
0
20 Feb 2024
VideoPrism: A Foundational Visual Encoder for Video Understanding
VideoPrism: A Foundational Visual Encoder for Video Understanding
Long Zhao
N. B. Gundavarapu
Liangzhe Yuan
Hao Zhou
Shen Yan
...
Huisheng Wang
Hartwig Adam
Mikhail Sirotenko
Ting Liu
Boqing Gong
VGen
33
29
0
20 Feb 2024
Video Editing for Video Retrieval
Video Editing for Video Retrieval
Bin Zhu
Kevin Flanagan
A. Fragomeni
Michael Wray
Dima Damen
CLIP
29
0
0
04 Feb 2024
Exploring Missing Modality in Multimodal Egocentric Datasets
Exploring Missing Modality in Multimodal Egocentric Datasets
Merey Ramazanova
Alejandro Pardo
Humam Alwassel
Bernard Ghanem
EgoV
28
4
0
21 Jan 2024
GPT4Ego: Unleashing the Potential of Pre-trained Models for Zero-Shot
  Egocentric Action Recognition
GPT4Ego: Unleashing the Potential of Pre-trained Models for Zero-Shot Egocentric Action Recognition
Guangzhao Dai
Xiangbo Shu
Wenhao Wu
Rui Yan
Jiachao Zhang
VLM
16
5
0
18 Jan 2024
DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)
DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)
Zongxin Yang
Guikun Chen
Xiaodi Li
Wenguan Wang
Yi Yang
LM&Ro
LLMAG
60
35
0
16 Jan 2024
Detours for Navigating Instructional Videos
Detours for Navigating Instructional Videos
Kumar Ashutosh
Zihui Xue
Tushar Nagarajan
Kristen Grauman
21
6
0
03 Jan 2024
Retrieval-Augmented Egocentric Video Captioning
Retrieval-Augmented Egocentric Video Captioning
Jilan Xu
Yifei Huang
Junlin Hou
Guo Chen
Yue Zhang
Rui Feng
Weidi Xie
EgoV
43
29
0
01 Jan 2024
A Simple LLM Framework for Long-Range Video Question-Answering
A Simple LLM Framework for Long-Range Video Question-Answering
Ce Zhang
Taixi Lu
Md. Mohaiminul Islam
Ziyang Wang
Shoubin Yu
Mohit Bansal
Gedas Bertasius
100
80
0
28 Dec 2023
A Strong Baseline for Temporal Video-Text Alignment
A Strong Baseline for Temporal Video-Text Alignment
Zeqian Li
Qirui Chen
Tengda Han
Ya-Qin Zhang
Yanfeng Wang
Weidi Xie
AI4TS
VGen
24
5
0
21 Dec 2023
Learning Object State Changes in Videos: An Open-World Perspective
Learning Object State Changes in Videos: An Open-World Perspective
Zihui Xue
Kumar Ashutosh
Kristen Grauman
VGen
32
18
0
19 Dec 2023
Grounded Question-Answering in Long Egocentric Videos
Grounded Question-Answering in Long Egocentric Videos
Shangzhe Di
Weidi Xie
37
23
0
11 Dec 2023
RGNet: A Unified Clip Retrieval and Grounding Network for Long Videos
RGNet: A Unified Clip Retrieval and Grounding Network for Long Videos
Tanveer Hannan
Md. Mohaiminul Islam
Thomas Seidl
Gedas Bertasius
28
3
0
11 Dec 2023
EgoPlan-Bench: Benchmarking Multimodal Large Language Models for
  Human-Level Planning
EgoPlan-Bench: Benchmarking Multimodal Large Language Models for Human-Level Planning
Yi Chen
Yuying Ge
Yixiao Ge
Mingyu Ding
Bohao Li
Rui Wang
Rui-Lan Xu
Ying Shan
Xihui Liu
LLMAG
ELM
LRM
19
9
0
11 Dec 2023
LifelongMemory: Leveraging LLMs for Answering Queries in Long-form
  Egocentric Videos
LifelongMemory: Leveraging LLMs for Answering Queries in Long-form Egocentric Videos
Ying Wang
Yanlai Yang
Mengye Ren
41
15
0
07 Dec 2023
LEGO: Learning EGOcentric Action Frame Generation via Visual Instruction
  Tuning
LEGO: Learning EGOcentric Action Frame Generation via Visual Instruction Tuning
Bolin Lai
Xiaoliang Dai
Lawrence Chen
Guan Pang
James M. Rehg
Miao Liu
33
14
0
06 Dec 2023
Spacewalk-18: A Benchmark for Multimodal and Long-form Procedural Video Understanding in Novel Domains
Spacewalk-18: A Benchmark for Multimodal and Long-form Procedural Video Understanding in Novel Domains
Rohan Myer Krishnan
Zitian Tang
Zhiqiu Yu
Chen Sun
53
1
0
30 Nov 2023
PALM: Predicting Actions through Language Models
PALM: Predicting Actions through Language Models
Sanghwan Kim
Daoji Huang
Yongqin Xian
Otmar Hilliges
Luc Van Gool
Xi Wang
VLM
19
10
0
29 Nov 2023
Exo2EgoDVC: Dense Video Captioning of Egocentric Procedural Activities
  Using Web Instructional Videos
Exo2EgoDVC: Dense Video Captioning of Egocentric Procedural Activities Using Web Instructional Videos
Takehiko Ohkawa
Takuma Yagi
Taichi Nishimura
Ryosuke Furuta
Atsushi Hashimoto
Yoshitaka Ushiku
Yoichi Sato
EgoV
36
8
0
28 Nov 2023
ViT-Lens: Towards Omni-modal Representations
ViT-Lens: Towards Omni-modal Representations
Weixian Lei
Yixiao Ge
Kun Yi
Jianfeng Zhang
Difei Gao
Dylan Sun
Yuying Ge
Ying Shan
Mike Zheng Shou
21
18
0
27 Nov 2023
Vamos: Versatile Action Models for Video Understanding
Vamos: Versatile Action Models for Video Understanding
Shijie Wang
Qi Zhao
Minh Quan Do
Nakul Agarwal
Kwonjoon Lee
Chen Sun
27
19
0
22 Nov 2023
MM-VID: Advancing Video Understanding with GPT-4V(ision)
MM-VID: Advancing Video Understanding with GPT-4V(ision)
Kevin Qinghong Lin
Faisal Ahmed
Linjie Li
Chung-Ching Lin
E. Azarnasab
...
Lin Liang
Zicheng Liu
Yumao Lu
Ce Liu
Lijuan Wang
MLLM
26
63
0
30 Oct 2023
Learning Temporal Sentence Grounding From Narrated EgoVideos
Learning Temporal Sentence Grounding From Narrated EgoVideos
Kevin Flanagan
Dima Damen
Michael Wray
23
3
0
26 Oct 2023
Human-oriented Representation Learning for Robotic Manipulation
Human-oriented Representation Learning for Robotic Manipulation
Mingxiao Huo
Mingyu Ding
Chenfeng Xu
Thomas Tian
Xinghao Zhu
Yao Mu
Lingfeng Sun
Masayoshi Tomizuka
Wei Zhan
SSL
36
12
0
04 Oct 2023
Training a Large Video Model on a Single Machine in a Day
Training a Large Video Model on a Single Machine in a Day
Yue Zhao
Philipp Krahenbuhl
VLM
29
15
0
28 Sep 2023
VidChapters-7M: Video Chapters at Scale
VidChapters-7M: Video Chapters at Scale
Antoine Yang
Arsha Nagrani
Ivan Laptev
Josef Sivic
Cordelia Schmid
VGen
13
26
0
25 Sep 2023
Towards Debiasing Frame Length Bias in Text-Video Retrieval via Causal
  Intervention
Towards Debiasing Frame Length Bias in Text-Video Retrieval via Causal Intervention
Burak Satar
Huaiyu Zhu
Hanwang Zhang
Joo-Hwee Lim
CML
30
0
0
17 Sep 2023
EgoPCA: A New Framework for Egocentric Hand-Object Interaction
  Understanding
EgoPCA: A New Framework for Egocentric Hand-Object Interaction Understanding
Yue Xu
Yong-Lu Li
Zhemin Huang
Michael Xu Liu
Cewu Lu
Yu-Wing Tai
Chi-Keung Tang
EgoV
20
9
0
05 Sep 2023
Previous
1234
Next