Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1501.02530
Cited By
A Dataset for Movie Description
12 January 2015
Anna Rohrbach
Marcus Rohrbach
Niket Tandon
Bernt Schiele
VGen
Re-assign community
ArXiv
PDF
HTML
Papers citing
"A Dataset for Movie Description"
50 / 257 papers shown
Title
MDMMT: Multidomain Multimodal Transformer for Video Retrieval
Maksim Dzabraev
M. Kalashnikov
Stepan Alekseevich Komkov
Aleksandr Petiushko
24
128
0
19 Mar 2021
On Semantic Similarity in Video Retrieval
Michael Wray
Hazel Doughty
Dima Damen
33
66
0
18 Mar 2021
A Straightforward Framework For Video Retrieval Using CLIP
Jesús Andrés Portillo-Quintero
J. C. Ortíz-Bayliss
Hugo Terashima-Marín
CLIP
324
117
0
24 Feb 2021
Less is More: ClipBERT for Video-and-Language Learning via Sparse Sampling
Jie Lei
Linjie Li
Luowei Zhou
Zhe Gan
Tamara L. Berg
Joey Tianyi Zhou
Jingjing Liu
CLIP
46
648
0
11 Feb 2021
The Role of the Input in Natural Language Video Description
S. Cascianelli
G. Costante
Alessandro Devo
Thomas Alessandro Ciarfuglia
P. Valigi
M. L. Fravolini
21
5
0
09 Feb 2021
Narration Generation for Cartoon Videos
Nikos Papasarantopoulos
Shay B. Cohen
VGen
25
2
0
17 Jan 2021
Recent Advances in Video Question Answering: A Review of Datasets and Methods
Devshree Patel
Ratnam Parikh
Yesha Shastri
15
18
0
15 Jan 2021
Learning Temporal Dynamics from Cycles in Narrated Video
Dave Epstein
Jiajun Wu
Cordelia Schmid
Chen Sun
AI4TS
38
14
0
07 Jan 2021
DVD: A Diagnostic Dataset for Multi-step Reasoning in Video Grounded Dialogue
Hung Le
Chinnadhurai Sankar
Seungwhan Moon
Ahmad Beirami
A. Geramifard
Satwik Kottur
VGen
39
18
0
01 Jan 2021
Movie Summarization via Sparse Graph Construction
Pinelopi Papalampidi
Frank Keller
Mirella Lapata
27
32
0
14 Dec 2020
MSVD-Turkish: A Comprehensive Multimodal Dataset for Integrated Vision and Language Research in Turkish
Begum Citamak
Ozan Caglayan
Menekse Kuyu
Erkut Erdem
Aykut Erdem
Pranava Madhyastha
Lucia Specia
31
8
0
13 Dec 2020
A Comprehensive Review on Recent Methods and Challenges of Video Description
Ashutosh Kumar Singh
Thoudam Doren Singh
Sivaji Bandyopadhyay
3DV
VLM
19
5
0
30 Nov 2020
QuerYD: A video dataset with high-quality text and audio narrations
Andreea-Maria Oncescu
João F. Henriques
Yang Liu
Andrew Zisserman
Samuel Albanie
VGen
22
11
0
22 Nov 2020
Video Action Understanding
Matthew Hutchinson
V. Gadepally
43
20
0
13 Oct 2020
Dual Encoding for Video Retrieval by Text
Jianfeng Dong
Xirong Li
Chaoxi Xu
Xun Yang
Gang Yang
Xun Wang
Meng Wang
24
2
0
10 Sep 2020
Identity-Aware Multi-Sentence Video Description
J. S. Park
Trevor Darrell
Anna Rohrbach
26
17
0
22 Aug 2020
Text-based Localization of Moments in a Video Corpus
Sudipta Paul
Niluthpol Chowdhury Mithun
Amit K. Roy-Chowdhury
10
14
0
20 Aug 2020
Poet: Product-oriented Video Captioner for E-commerce
Shengyu Zhang
Ziqi Tan
Jin Yu
Zhou Zhao
Kun Kuang
Jie Liu
Jingren Zhou
Hongxia Yang
Fei Wu
14
34
0
16 Aug 2020
Enriching Video Captions With Contextual Text
Philipp Rimle
Pelin Dogan
Markus Gross
30
3
0
29 Jul 2020
Active Learning for Video Description With Cluster-Regularized Ensemble Ranking
David M. Chan
Sudheendra Vijayanarasimhan
David A. Ross
John F. Canny
VLM
14
6
0
27 Jul 2020
MovieNet: A Holistic Dataset for Movie Understanding
Qingqiu Huang
Yu Xiong
Anyi Rao
Jiaze Wang
Dahua Lin
VGen
45
235
0
21 Jul 2020
Multi-modal Transformer for Video Retrieval
Valentin Gabeur
Chen Sun
Alahari Karteek
Cordelia Schmid
ViT
430
596
0
21 Jul 2020
Knowledge-Based Video Question Answering with Unsupervised Scene Descriptions
Noa Garcia
Yuta Nakashima
26
32
0
17 Jul 2020
Tree-Augmented Cross-Modal Encoding for Complex-Query Video Retrieval
Xun Yang
Jianfeng Dong
Yixin Cao
Xun Wang
Meng Wang
Tat-Seng Chua
33
137
0
06 Jul 2020
Auto-captions on GIF: A Large-scale Video-sentence Dataset for Vision-language Pre-training
Yingwei Pan
Yehao Li
Jianjie Luo
Jun Xu
Ting Yao
Tao Mei
38
57
0
05 Jul 2020
Comprehensive Information Integration Modeling Framework for Video Titling
Shengyu Zhang
Ziqi Tan
Jin Yu
Zhou Zhao
Kun Kuang
Tan Jiang
Jingren Zhou
Hongxia Yang
Fei Wu
31
40
0
24 Jun 2020
Rescaling Egocentric Vision
Dima Damen
Hazel Doughty
G. Farinella
Antonino Furnari
Evangelos Kazakos
...
Davide Moltisanti
Jonathan Munro
Toby Perrett
Will Price
Michael Wray
EgoV
19
437
0
23 Jun 2020
The EPIC-KITCHENS Dataset: Collection, Challenges and Baselines
Dima Damen
Hazel Doughty
G. Farinella
Sanja Fidler
Antonino Furnari
...
Davide Moltisanti
Jonathan Munro
Toby Perrett
Will Price
Michael Wray
EgoV
23
225
0
29 Apr 2020
Noise Estimation Using Density Estimation for Self-Supervised Multimodal Learning
Elad Amrani
Rami Ben-Ari
Daniel Rotman
A. Bronstein
17
121
0
06 Mar 2020
ManyModalQA: Modality Disambiguation and QA over Diverse Inputs
Darryl Hannan
Akshay Jain
Joey Tianyi Zhou
AAML
38
57
0
22 Jan 2020
On the Evaluation of Intelligent Process Automation
Deborah Ferreira
Julia Rozanova
K. Dubba
Dell Zhang
André Freitas
9
9
0
08 Jan 2020
End-to-End Learning of Visual Representations from Uncurated Instructional Videos
Antoine Miech
Jean-Baptiste Alayrac
Lucas Smaira
Ivan Laptev
Josef Sivic
Andrew Zisserman
VGen
SSL
42
703
0
13 Dec 2019
Assessing the Robustness of Visual Question Answering Models
Jia-Hong Huang
Modar Alfadly
Guohao Li
M. Worring
AAML
OOD
23
23
0
30 Nov 2019
A Graph-Based Framework to Bridge Movies and Synopses
Yu Xiong
Chengyi Zhang
Lingfeng Guo
Hang Zhou
Bolei Zhou
Dahua Lin
27
62
0
24 Oct 2019
Embodied Language Grounding with 3D Visual Feature Representations
Mihir Prabhudesai
H. Tung
Syed Ashar Javed
Maximilian Sieb
Adam W. Harley
Katerina Fragkiadaki
25
21
0
02 Oct 2019
Proposal-free Temporal Moment Localization of a Natural-Language Query in Video using Guided Attention
Cristian Rodriguez-Opazo
Edison Marrese-Taylor
F. Saleh
Hongdong Li
Stephen Gould
30
147
0
20 Aug 2019
Use What You Have: Video Retrieval Using Representations From Collaborative Experts
Yang Liu
Samuel Albanie
Arsha Nagrani
Andrew Zisserman
36
387
0
31 Jul 2019
Finding Moments in Video Collections Using Natural Language
Victor Escorcia
Mattia Soldan
Josef Sivic
Guohao Li
Bryan C. Russell
31
6
0
30 Jul 2019
Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods
Aditya Mogadala
M. Kalimuthu
Dietrich Klakow
VLM
25
133
0
22 Jul 2019
Cross-Lingual Transfer Learning for Question Answering
Chia-Hsuan Lee
Hung-yi Lee
28
23
0
13 Jul 2019
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips
Antoine Miech
Dimitri Zhukov
Jean-Baptiste Alayrac
Makarand Tapaswi
Ivan Laptev
Josef Sivic
VGen
27
1,175
0
07 Jun 2019
Synthetic Defocus and Look-Ahead Autofocus for Casual Videography
X. Zhang
Kevin Blackburn-Matzen
Vivien Nguyen
Dillon Yao
You Zhang
Ren Ng
VGen
26
37
0
15 May 2019
VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research
Xin Eric Wang
Jiawei Wu
Junkun Chen
Lei Li
Yuan-fang Wang
William Yang Wang
32
539
0
06 Apr 2019
M-VAD Names: a Dataset for Video Captioning with Naming
S. Pini
Marcella Cornia
Federico Bolelli
Lorenzo Baraldi
Rita Cucchiara
27
29
0
04 Mar 2019
Hierarchical LSTMs with Adaptive Attention for Visual Captioning
Jingkuan Song
Xiangpeng Li
Lianli Gao
Heng Tao Shen
23
221
0
26 Dec 2018
MTLE: A Multitask Learning Encoder of Visual Feature Representations for Video and Movie Description
Oliver A. Nina
Washington Garcia
Scott Clouse
Alper Yilmaz
23
4
0
19 Sep 2018
Dual Encoding for Zero-Example Video Retrieval
Jianfeng Dong
Xirong Li
Chaoxi Xu
S. Ji
Yuan He
Gang Yang
Xun Wang
30
268
0
17 Sep 2018
LiveBot: Generating Live Video Comments Based on Visual and Textual Contexts
Shuming Ma
Lei Cui
Damai Dai
Furu Wei
Xu Sun
VGen
28
61
0
13 Sep 2018
TVQA: Localized, Compositional Video Question Answering
Muhammad Abdul Wahab
Licheng Yu
Mounir Nasr Allah
Tamara L. Berg
36
617
0
05 Sep 2018
A Joint Sequence Fusion Model for Video Question Answering and Retrieval
Youngjae Yu
Jongseok Kim
Gunhee Kim
40
340
0
07 Aug 2018
Previous
1
2
3
4
5
6
Next