Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2308.10402
Cited By
Simple Baselines for Interactive Video Retrieval with Questions and Answers
21 August 2023
Kaiqu Liang
Samuel Albanie
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Simple Baselines for Interactive Video Retrieval with Questions and Answers"
34 / 34 papers shown
Title
Learning to Retrieve Videos by Asking Questions
Avinash Madasu
Junier Oliva
Gedas Bertasius
VGen
48
16
0
11 May 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
Guosheng Lin
MLLM
BDL
VLM
CLIP
527
4,343
0
28 Jan 2022
Multi-Query Video Retrieval
Zeyu Wang
Yu Wu
Karthik Narasimhan
Olga Russakovsky
76
17
0
10 Jan 2022
MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound
Rowan Zellers
Jiasen Lu
Ximing Lu
Youngjae Yu
Yanpeng Zhao
Mohammadreza Salehi
Aditya Kusupati
Jack Hessel
Ali Farhadi
Yejin Choi
94
212
0
07 Jan 2022
Multitask Prompted Training Enables Zero-Shot Task Generalization
Victor Sanh
Albert Webson
Colin Raffel
Stephen H. Bach
Lintang Sutawika
...
T. Bers
Stella Biderman
Leo Gao
Thomas Wolf
Alexander M. Rush
LRM
342
1,701
0
15 Oct 2021
Align before Fuse: Vision and Language Representation Learning with Momentum Distillation
Junnan Li
Ramprasaath R. Selvaraju
Akhilesh Deepak Gotmare
Shafiq Joty
Caiming Xiong
Guosheng Lin
FaML
186
1,953
0
16 Jul 2021
TEACHTEXT: CrossModal Generalized Distillation for Text-Video Retrieval
Ioana Croitoru
Simion-Vlad Bogolin
Marius Leordeanu
Hailin Jin
Andrew Zisserman
Samuel Albanie
Yang Liu
VGen
63
125
0
16 Apr 2021
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval
Max Bain
Arsha Nagrani
Gül Varol
Andrew Zisserman
VGen
136
1,173
0
01 Apr 2021
Ask&Confirm: Active Detail Enriching for Cross-Modal Retrieval with Partial Query
Guanyu Cai
Jun Zhang
Xinyang Jiang
Yifei Gong
Lianghua He
Fufu Yu
Pai Peng
Xiaowei Guo
Feiyue Huang
Xing Sun
58
13
0
02 Mar 2021
Less is More: ClipBERT for Video-and-Language Learning via Sparse Sampling
Jie Lei
Linjie Li
Luowei Zhou
Zhe Gan
Tamara L. Berg
Joey Tianyi Zhou
Jingjing Liu
CLIP
118
661
0
11 Feb 2021
Location-aware Graph Convolutional Networks for Video Question Answering
Deng Huang
Peihao Chen
Runhao Zeng
Qing Du
Mingkui Tan
Chuang Gan
GNN
BDL
83
175
0
07 Aug 2020
Dialog without Dialog Data: Learning Visual Dialog Agents from VQA Data
Michael Cogswell
Jiasen Lu
Rishabh Jain
Stefan Lee
Devi Parikh
Dhruv Batra
VLM
EgoV
58
15
0
24 Jul 2020
Multi-modal Transformer for Video Retrieval
Valentin Gabeur
Chen Sun
Alahari Karteek
Cordelia Schmid
ViT
531
608
0
21 Jul 2020
Dialog Policy Learning for Joint Clarification and Active Learning Queries
Aishwarya Padmakumar
Raymond J. Mooney
29
8
0
09 Jun 2020
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
Xiujun Li
Xi Yin
Chunyuan Li
Pengchuan Zhang
Xiaowei Hu
...
Houdong Hu
Li Dong
Furu Wei
Yejin Choi
Jianfeng Gao
VLM
105
1,938
0
13 Apr 2020
Hierarchical Conditional Relation Networks for Video Question Answering
T. Le
Vuong Le
Svetha Venkatesh
T. Tran
77
259
0
25 Feb 2020
End-to-End Learning of Visual Representations from Uncurated Instructional Videos
Antoine Miech
Jean-Baptiste Alayrac
Lucas Smaira
Ivan Laptev
Josef Sivic
Andrew Zisserman
VGen
SSL
114
711
0
13 Dec 2019
Use What You Have: Video Retrieval Using Representations From Collaborative Experts
Yang Liu
Samuel Albanie
Arsha Nagrani
Andrew Zisserman
73
389
0
31 Jul 2019
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips
Antoine Miech
Dimitri Zhukov
Jean-Baptiste Alayrac
Makarand Tapaswi
Ivan Laptev
Josef Sivic
VGen
108
1,200
0
07 Jun 2019
Heterogeneous Memory Enhanced Multimodal Attention Model for Video Question Answering
Chenyou Fan
Xiaofan Zhang
Shu Zhang
Wensheng Wang
Chi Zhang
Heng-Chiao Huang
49
278
0
08 Apr 2019
Information Maximizing Visual Question Generation
Ranjay Krishna
Michael S. Bernstein
Li Fei-Fei
92
95
0
27 Mar 2019
Audio-Visual Scene-Aware Dialog
Huda AlAmri
Vincent Cartillier
Abhishek Das
Jue Wang
A. Cherian
...
Tim K. Marks
Chiori Hori
Peter Anderson
Stefan Lee
Devi Parikh
VGen
52
192
0
25 Jan 2019
A Joint Sequence Fusion Model for Video Question Answering and Retrieval
Youngjae Yu
Jongseok Kim
Gunhee Kim
77
345
0
07 Aug 2018
Learning a Text-Video Embedding from Incomplete and Heterogeneous Data
Antoine Miech
Ivan Laptev
Josef Sivic
61
234
0
07 Apr 2018
Motion-Appearance Co-Memory Networks for Video Question Answering
J. Gao
Runzhou Ge
Kan Chen
Ram Nevatia
111
241
0
29 Mar 2018
Answerer in Questioner's Mind: Information Theoretic Approach to Goal-Oriented Visual Dialog
Sang-Woo Lee
Y. Heo
Byoung-Tak Zhang
62
31
0
12 Feb 2018
Localizing Moments in Video with Natural Language
Lisa Anne Hendricks
Oliver Wang
Eli Shechtman
Josef Sivic
Trevor Darrell
Bryan C. Russell
110
946
0
04 Aug 2017
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
Peter Anderson
Xiaodong He
Chris Buehler
Damien Teney
Mark Johnson
Stephen Gould
Lei Zhang
AIMat
121
4,215
0
25 Jul 2017
TGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question Answering
Y. Jang
Yale Song
Youngjae Yu
Youngjin Kim
Gunhee Kim
75
555
0
14 Apr 2017
Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning
Abhishek Das
Satwik Kottur
J. M. F. Moura
Stefan Lee
Dhruv Batra
OffRL
110
425
0
20 Mar 2017
Visual Dialog
Abhishek Das
Satwik Kottur
Khushi Gupta
Avi Singh
Deshraj Yadav
José M. F. Moura
Devi Parikh
Dhruv Batra
142
997
0
26 Nov 2016
Adversarial Feature Learning
Jiasen Lu
Philipp Krahenbuhl
Trevor Darrell
GAN
109
1,831
0
31 May 2016
Stacked Attention Networks for Image Question Answering
Zichao Yang
Xiaodong He
Jianfeng Gao
Li Deng
Alex Smola
BDL
107
1,882
0
07 Nov 2015
A Dataset for Movie Description
Anna Rohrbach
Marcus Rohrbach
Niket Tandon
Bernt Schiele
VGen
110
500
0
12 Jan 2015
1