Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1604.02748
Cited By
TGIF: A New Dataset and Benchmark on Animated GIF Description
10 April 2016
Yuncheng Li
Yale Song
Liangliang Cao
Joel R. Tetreault
Larry Goldberg
A. Jaimes
Jiebo Luo
Re-assign community
ArXiv
PDF
HTML
Papers citing
"TGIF: A New Dataset and Benchmark on Animated GIF Description"
31 / 31 papers shown
Title
VideoVista-CulturalLingo: 360
∘
^\circ
∘
Horizons-Bridging Cultures, Languages, and Domains in Video Comprehension
Xinyu Chen
Yunxin Li
Haoyuan Shi
Baotian Hu
Wenhan Luo
Yaowei Wang
M. Zhang
ELM
62
0
0
23 Apr 2025
Beyond Text-Visual Attention: Exploiting Visual Cues for Effective Token Pruning in VLMs
Qizhe Zhang
Aosong Cheng
Ming Lu
Zhiyong Zhuo
Minqi Wang
Jiajun Cao
Shaobo Guo
Qi She
Shanghang Zhang
VLM
90
11
0
02 Dec 2024
VideoSAVi: Self-Aligned Video Language Models without Human Supervision
Yogesh Kulkarni
Pooyan Fazli
VLM
100
2
0
01 Dec 2024
Beyond Training: Dynamic Token Merging for Zero-Shot Video Understanding
Yiming Zhang
Zhuokai Zhao
Zhaorun Chen
Zenghui Ding
Xianjun Yang
Yining Sun
186
1
0
21 Nov 2024
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models
Mingze Xu
Mingfei Gao
Zhe Gan
Hong-You Chen
Zhengfeng Lai
Haiming Gang
Kai Kang
Afshin Dehghan
50
48
0
22 Jul 2024
Tarsier: Recipes for Training and Evaluating Large Video Description Models
Jiawei Wang
Liping Yuan
Yuchen Zhang
33
52
0
30 Jun 2024
VideoVista: A Versatile Benchmark for Video Understanding and Reasoning
Yunxin Li
Xinyu Chen
Baotian Hu
Longyue Wang
Haoyuan Shi
Min-Ling Zhang
MLLM
LRM
42
25
0
17 Jun 2024
DAM: Dynamic Adapter Merging for Continual Video QA Learning
Feng Cheng
Ziyang Wang
Yi-Lin Sung
Yan-Bo Lin
Mohit Bansal
Gedas Bertasius
CLL
MoMe
31
10
0
13 Mar 2024
PG-Video-LLaVA: Pixel Grounding Large Video-Language Models
Shehan Munasinghe
Rusiru Thushara
Muhammad Maaz
H. Rasheed
Salman Khan
Mubarak Shah
Fahad Khan
VLM
MLLM
24
34
0
22 Nov 2023
u-LLaVA: Unifying Multi-Modal Tasks via Large Language Model
Jinjin Xu
Liwu Xu
Yuzhe Yang
Xiang Li
Fanyi Wang
Yanchun Xie
Yi-Jie Huang
Yaqian Li
MoE
MLLM
VLM
27
12
0
09 Nov 2023
Reuse and Diffuse: Iterative Denoising for Text-to-Video Generation
Jiaxi Gu
Shicong Wang
Haoyu Zhao
Tianyi Lu
Xing Zhang
Zuxuan Wu
Songcen Xu
Wei Zhang
Yu-Gang Jiang
Hang Xu
DiffM
VGen
36
43
0
07 Sep 2023
InternVideo: General Video Foundation Models via Generative and Discriminative Learning
Yi Wang
Kunchang Li
Yizhuo Li
Yinan He
Bingkun Huang
...
Junting Pan
Jiashuo Yu
Yali Wang
Limin Wang
Yu Qiao
VLM
VGen
38
309
0
06 Dec 2022
SmallCap: Lightweight Image Captioning Prompted with Retrieval Augmentation
R. Ramos
Bruno Martins
Desmond Elliott
Yova Kementchedjhieva
VLM
30
86
0
30 Sep 2022
(Un)likelihood Training for Interpretable Embedding
Jiaxin Wu
Chong-Wah Ngo
W. Chan
Zhijian Hou
12
2
0
01 Jul 2022
Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
Antoine Yang
Antoine Miech
Josef Sivic
Ivan Laptev
Cordelia Schmid
34
226
0
16 Jun 2022
Structured Two-stream Attention Network for Video Question Answering
Lianli Gao
Pengpeng Zeng
Jingkuan Song
Yuan-Fang Li
Wu Liu
Tao Mei
Heng Tao Shen
25
68
0
02 Jun 2022
Modeling Motion with Multi-Modal Features for Text-Based Video Segmentation
Wangbo Zhao
Kai Wang
Xiangxiang Chu
Fuzhao Xue
Xinchao Wang
Yang You
29
21
0
06 Apr 2022
MDMMT-2: Multidomain Multimodal Transformer for Video Retrieval, One More Step Towards Generalization
Alexander Kunitsyn
M. Kalashnikov
Maksim Dzabraev
Andrei Ivaniuta
28
16
0
14 Mar 2022
A Bilingual, OpenWorld Video Text Dataset and End-to-end Video Text Spotter with Transformer
Weijia Wu
Yuanqiang Cai
Debing Zhang
Sibo Wang
Zhuang Li
Jiahong Li
Yejun Tang
Hong Zhou
30
29
0
09 Dec 2021
Question Answering Survey: Directions, Challenges, Datasets, Evaluation Matrices
Hariom A. Pandya
Brijesh S. Bhatt
38
27
0
07 Dec 2021
Happy Dance, Slow Clap: Using Reaction GIFs to Predict Induced Affect on Twitter
Boaz Shmueli
Soumya Ray
Lun-Wei Ku
13
6
0
20 May 2021
MDMMT: Multidomain Multimodal Transformer for Video Retrieval
Maksim Dzabraev
M. Kalashnikov
Stepan Alekseevich Komkov
Aleksandr Petiushko
13
128
0
19 Mar 2021
Dual Encoding for Video Retrieval by Text
Jianfeng Dong
Xirong Li
Chaoxi Xu
Xun Yang
Gang Yang
Xun Wang
Meng Wang
19
2
0
10 Sep 2020
Better Captioning with Sequence-Level Exploration
Jia Chen
Qin Jin
29
12
0
08 Mar 2020
VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research
Xin Eric Wang
Jiawei Wu
Junkun Chen
Lei Li
Yuan-fang Wang
William Yang Wang
15
540
0
06 Apr 2019
Dual Encoding for Zero-Example Video Retrieval
Jianfeng Dong
Xirong Li
Chaoxi Xu
S. Ji
Yuan He
Gang Yang
Xun Wang
24
268
0
17 Sep 2018
DeepStory: Video Story QA by Deep Embedded Memory Networks
Kyung-Min Kim
Min-Oh Heo
Seongho Choi
Byoung-Tak Zhang
19
174
0
04 Jul 2017
The Forgettable-Watcher Model for Video Question Answering
Hongyang Xue
Zhou Zhao
Deng Cai
19
9
0
03 May 2017
TGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question Answering
Y. Jang
Yale Song
Youngjae Yu
Youngjin Kim
Gunhee Kim
19
545
0
14 Apr 2017
Sync-DRAW: Automatic Video Generation using Deep Recurrent Attentive Architectures
Gaurav Mittal
Tanya Marwah
V. Balasubramanian
VGen
DiffM
29
67
0
30 Nov 2016
Movie Description
Anna Rohrbach
Atousa Torabi
Marcus Rohrbach
Niket Tandon
C. Pal
Hugo Larochelle
Aaron Courville
Bernt Schiele
3DV
VGen
30
353
0
12 May 2016
1