ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2312.06505
  4. Cited By
Grounded Question-Answering in Long Egocentric Videos
v1v2v3v4 (latest)

Grounded Question-Answering in Long Egocentric Videos

11 December 2023
Shangzhe Di
Weidi Xie
ArXiv (abs)PDFHTMLGithub (61★)

Papers citing "Grounded Question-Answering in Long Egocentric Videos"

29 / 29 papers shown
Title
Human-inspired Perspectives: A Survey on AI Long-term Memory
Human-inspired Perspectives: A Survey on AI Long-term Memory
Zihong He
Weizhe Lin
Hao Zheng
Fan Zhang
Matt Jones
Laurence Aitchison
X. Xu
Miao Liu
Per Ola Kristensson
Junxiao Shen
192
3
0
01 Nov 2024
MM-Ego: Towards Building Egocentric Multimodal LLMs for Video QA
MM-Ego: Towards Building Egocentric Multimodal LLMs for Video QA
Hanrong Ye
Haotian Zhang
Erik Daxberger
Lin Chen
Zongyu Lin
...
Haoxuan You
Dan Xu
Zhe Gan
Jiasen Lu
Yinfei Yang
EgoVMLLM
134
14
0
09 Oct 2024
EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language
  Understanding
EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language Understanding
K. Mangalam
Raiymbek Akshulakov
Jitendra Malik
105
293
0
17 Aug 2023
GroundNLQ @ Ego4D Natural Language Queries Challenge 2023
GroundNLQ @ Ego4D Natural Language Queries Challenge 2023
Zhijian Hou
Lei Ji
Difei Gao
Wanjun Zhong
Kun Yan
Chong Li
W. Chan
Chong-Wah Ngo
Nan Duan
Mike Zheng Shou
66
17
0
27 Jun 2023
InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D Challenges
InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D Challenges
Guo Chen
Sen Xing
Zhe Chen
Yi Wang
Kunchang Li
...
Hongjie Zhang
Tong Lu
Yali Wang
Liming Wang
Yu Qiao
67
49
0
17 Nov 2022
Scaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language Models
Hyung Won Chung
Le Hou
Shayne Longpre
Barret Zoph
Yi Tay
...
Jacob Devlin
Adam Roberts
Denny Zhou
Quoc V. Le
Jason W. Wei
ReLMLRM
194
3,146
0
20 Oct 2022
SAVi++: Towards End-to-End Object-Centric Learning from Real-World
  Videos
SAVi++: Towards End-to-End Object-Centric Learning from Real-World Videos
Gamaleldin F. Elsayed
Aravindh Mahendran
Sjoerd van Steenkiste
Klaus Greff
Michael C. Mozer
Thomas Kipf
VOSOCL
129
142
0
15 Jun 2022
GLIPv2: Unifying Localization and Vision-Language Understanding
GLIPv2: Unifying Localization and Vision-Language Understanding
Haotian Zhang
Pengchuan Zhang
Xiaowei Hu
Yen-Chun Chen
Liunian Harold Li
Xiyang Dai
Lijuan Wang
Lu Yuan
Lei Li
Jianfeng Gao
ObjDVLM
90
300
0
12 Jun 2022
Egocentric Video-Language Pretraining
Egocentric Video-Language Pretraining
Kevin Qinghong Lin
Alex Jinpeng Wang
Mattia Soldan
Michael Wray
Rui Yan
...
Hongfa Wang
Dima Damen
Guohao Li
Wei Liu
Mike Zheng Shou
VLMEgoV
75
205
0
03 Jun 2022
ActionFormer: Localizing Moments of Actions with Transformers
ActionFormer: Localizing Moments of Actions with Transformers
Chen-Da Liu-Zhang
Jianxin Wu
Yin Li
ViT
70
342
0
16 Feb 2022
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Kristen Grauman
Andrew Westbury
Eugene Byrne
Zachary Chavis
Antonino Furnari
...
Mike Zheng Shou
Antonio Torralba
Lorenzo Torresani
Mingfei Yan
Jitendra Malik
EgoV
399
1,107
0
13 Oct 2021
Anticipative Video Transformer
Anticipative Video Transformer
Rohit Girdhar
Kristen Grauman
ViT
65
211
0
03 Jun 2021
Learning to Rehearse in Long Sequence Memorization
Learning to Rehearse in Long Sequence Memorization
Zhu Zhang
Chang Zhou
Jianxin Ma
Zhijie Lin
Jingren Zhou
Hongxia Yang
Zhou Zhao
RALM
31
9
0
02 Jun 2021
NExT-QA:Next Phase of Question-Answering to Explaining Temporal Actions
NExT-QA:Next Phase of Question-Answering to Explaining Temporal Actions
Junbin Xiao
Xindi Shang
Angela Yao
Tat-Seng Chua
89
498
0
18 May 2021
Learning to Track with Object Permanence
Learning to Track with Object Permanence
P. Tokmakov
Jie Li
Wolfram Burgard
Adrien Gaidon
VOT
89
206
0
26 Mar 2021
Object-Centric Learning with Slot Attention
Object-Centric Learning with Slot Attention
Francesco Locatello
Dirk Weissenborn
Thomas Unterthiner
Aravindh Mahendran
G. Heigold
Jakob Uszkoreit
Alexey Dosovitskiy
Thomas Kipf
OCL
225
856
0
26 Jun 2020
HERO: Hierarchical Encoder for Video+Language Omni-representation
  Pre-training
HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training
Linjie Li
Yen-Chun Chen
Yu Cheng
Zhe Gan
Licheng Yu
Jingjing Liu
MLLMVLMOffRLAI4TS
118
503
0
01 May 2020
Span-based Localizing Network for Natural Language Video Localization
Span-based Localizing Network for Natural Language Video Localization
Hao Zhang
Aixin Sun
Wei Jing
Qiufeng Wang
87
315
0
29 Apr 2020
Distance-IoU Loss: Faster and Better Learning for Bounding Box
  Regression
Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression
Zhaohui Zheng
Ping Wang
Wei Liu
Jinze Li
Rongguang Ye
Dongwei Ren
NoLa
109
3,705
0
19 Nov 2019
Exploring the Limits of Transfer Learning with a Unified Text-to-Text
  Transformer
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel
Noam M. Shazeer
Adam Roberts
Katherine Lee
Sharan Narang
Michael Matena
Yanqi Zhou
Wei Li
Peter J. Liu
AIMat
445
20,298
0
23 Oct 2019
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Nils Reimers
Iryna Gurevych
1.3K
12,295
0
27 Aug 2019
EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action
  Recognition
EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action Recognition
Evangelos Kazakos
Arsha Nagrani
Andrew Zisserman
Dima Damen
EgoV
62
340
0
22 Aug 2019
ActivityNet-QA: A Dataset for Understanding Complex Web Videos via
  Question Answering
ActivityNet-QA: A Dataset for Understanding Complex Web Videos via Question Answering
Zhou Yu
D. Xu
Jun-chen Yu
Ting Yu
Zhou Zhao
Yueting Zhuang
Dacheng Tao
110
470
0
06 Jun 2019
Grounded Human-Object Interaction Hotspots from Video
Grounded Human-Object Interaction Hotspots from Video
Tushar Nagarajan
Christoph Feichtenhofer
Kristen Grauman
76
159
0
11 Dec 2018
SlowFast Networks for Video Recognition
SlowFast Networks for Video Recognition
Christoph Feichtenhofer
Haoqi Fan
Jitendra Malik
Kaiming He
166
3,282
0
10 Dec 2018
Localizing Moments in Video with Natural Language
Localizing Moments in Video with Natural Language
Lisa Anne Hendricks
Oliver Wang
Eli Shechtman
Josef Sivic
Trevor Darrell
Bryan C. Russell
115
949
0
04 Aug 2017
An Overview of Multi-Task Learning in Deep Neural Networks
An Overview of Multi-Task Learning in Deep Neural Networks
Sebastian Ruder
CVBM
151
2,830
0
15 Jun 2017
TALL: Temporal Activity Localization via Language Query
TALL: Temporal Activity Localization via Language Query
J. Gao
Chen Sun
Zhenheng Yang
Ram Nevatia
123
820
0
05 May 2017
Dense-Captioning Events in Videos
Dense-Captioning Events in Videos
Ranjay Krishna
Kenji Hata
F. Ren
Li Fei-Fei
Juan Carlos Niebles
139
1,248
0
02 May 2017
1