Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1505.01809
Cited By
Language Models for Image Captioning: The Quirks and What Works
7 May 2015
Jacob Devlin
Hao Cheng
Hao Fang
Saurabh Gupta
Li Deng
Xiaodong He
Geoffrey Zweig
Margaret Mitchell
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Language Models for Image Captioning: The Quirks and What Works"
45 / 45 papers shown
Title
TopV: Compatible Token Pruning with Inference Time Optimization for Fast and Low-Memory Multimodal Vision Language Model
Cheng Yang
Yang Sui
Jinqi Xiao
Lingyi Huang
Yu Gong
...
Jinghua Yan
Y. Bai
P. Sadayappan
Xia Hu
Bo Yuan
VLM
59
0
0
24 Mar 2025
Standardizing the Measurement of Text Diversity: A Tool and a Comparative Analysis of Scores
Chantal Shaib
Joe Barrow
Jiuding Sun
Alexa F. Siu
Byron C. Wallace
A. Nenkova
66
33
0
01 Mar 2024
Text-Only Training for Visual Storytelling
Yuechen Wang
Wen-gang Zhou
Zhenbo Lu
Houqiang Li
DiffM
28
2
0
17 Aug 2023
Arithmetic Sampling: Parallel Diverse Decoding for Large Language Models
Luke Vilnis
Yury Zemlyanskiy
Patrick C. Murray
Alexandre Passos
Sumit Sanghai
59
9
0
18 Oct 2022
It Isn't Sh!tposting, It's My CAT Posting
Parthsarthi Rawat
Sayan Das
Jorge Aguirre
Akhil Daphara
ViT
22
0
0
18 May 2022
Multi-Glimpse Network: A Robust and Efficient Classification Architecture based on Recurrent Downsampled Attention
S. Tan
Runpei Dong
Kaisheng Ma
22
2
0
03 Nov 2021
Multi-Modal Image Captioning for the Visually Impaired
Hiba Ahsan
Nikita Bhalla
Daivat Bhatt
Kaivankumar Shah
22
20
0
17 May 2021
Every Model Learned by Gradient Descent Is Approximately a Kernel Machine
Pedro M. Domingos
MLT
29
70
0
30 Nov 2020
Gaussian Smoothen Semantic Features (GSSF) -- Exploring the Linguistic Aspects of Visual Captioning in Indian Languages (Bengali) Using MSCOCO Framework
C. Sur
24
7
0
16 Feb 2020
MRRC: Multiple Role Representation Crossover Interpretation for Image Captioning With R-CNN Feature Distribution Composition (FDC)
C. Sur
25
16
0
15 Feb 2020
Going Beneath the Surface: Evaluating Image Captioning for Grammaticality, Truthfulness and Diversity
Huiyuan Xie
Tom Sherborne
A. Kuhnle
Ann A. Copestake
DiffM
22
9
0
19 Dec 2019
Does BERT Make Any Sense? Interpretable Word Sense Disambiguation with Contextualized Embeddings
Gregor Wiedemann
Steffen Remus
Avi Chawla
Chris Biemann
19
174
0
23 Sep 2019
Learning Visual Relation Priors for Image-Text Matching and Image Captioning with Neural Scene Graph Generators
Kuang-Huei Lee
Hamid Palangi
Xi Chen
Houdong Hu
Jianfeng Gao
VLM
24
37
0
22 Sep 2019
Compositional Generalization in Image Captioning
Mitja Nikolaus
Mostafa Abdou
Matthew Lamm
Rahul Aralikatte
Desmond Elliott
CoGe
24
49
0
10 Sep 2019
MeetUp! A Corpus of Joint Activity Dialogues in a Visual Environment
N. Ilinykh
Sina Zarrieß
David Schlangen
24
43
0
11 Jul 2019
Sequence-to-Sequence Models for Data-to-Text Natural Language Generation: Word- vs. Character-based Processing and Output Diversity
Glorianna Jagfeld
Sabrina Jenne
Ngoc Thang Vu
AIMat
38
24
0
11 Oct 2018
A Comprehensive Survey of Deep Learning for Image Captioning
Md. Zakir Hossain
Ferdous Sohel
M. Shiratuddin
Hamid Laga
VLM
3DV
33
760
0
06 Oct 2018
Stacked Cross Attention for Image-Text Matching
Kuang-Huei Lee
Xi Chen
G. Hua
Houdong Hu
Xiaodong He
15
1,140
0
21 Mar 2018
Neural Aesthetic Image Reviewer
Wenshan Wang
Su Yang
Weishan Zhang
Jiulong Zhang
19
38
0
28 Feb 2018
Attacking Visual Language Grounding with Adversarial Examples: A Case Study on Neural Image Captioning
Hongge Chen
Huan Zhang
Pin-Yu Chen
Jinfeng Yi
Cho-Jui Hsieh
GAN
AAML
29
49
0
06 Dec 2017
Diverse and Accurate Image Description Using a Variational Auto-Encoder with an Additive Gaussian Encoding Space
Liwei Wang
A. Schwing
Svetlana Lazebnik
CoGe
31
175
0
19 Nov 2017
AI Challenger : A Large-scale Dataset for Going Deeper in Image Understanding
Jiahong Wu
He Zheng
Bo-Lu Zhao
Yixin Li
Baoming Yan
...
Shipei Zhou
G. Lin
Yanwei Fu
Yizhou Wang
Yonggang Wang
VLM
32
149
0
17 Nov 2017
Self-Guiding Multimodal LSTM - when we do not have a perfect training dataset for image captioning
Yang Xian
Yingli Tian
VLM
25
22
0
15 Sep 2017
Multimodal Machine Learning: A Survey and Taxonomy
T. Baltrušaitis
Chaitanya Ahuja
Louis-Philippe Morency
15
2,859
0
26 May 2017
Towards a Visual Privacy Advisor: Understanding and Predicting Privacy Risks in Images
Rakshith Shetty
Bernt Schiele
Mario Fritz
32
223
0
30 Mar 2017
Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation
Albert Gatt
E. Krahmer
LM&MA
ELM
27
810
0
29 Mar 2017
Where to put the Image in an Image Caption Generator
Marc Tanti
Albert Gatt
K. Camilleri
44
96
0
27 Mar 2017
Recurrent Models for Situation Recognition
Arun Mallya
Svetlana Lazebnik
14
30
0
18 Mar 2017
MAT: A Multimodal Attentive Translator for Image Captioning
Chang Liu
F. Sun
Changhu Wang
Feng Wang
Alan Yuille
17
58
0
18 Feb 2017
Guided Open Vocabulary Image Captioning with Constrained Beam Search
Peter Anderson
Basura Fernando
Mark Johnson
Stephen Gould
21
232
0
02 Dec 2016
Semantic Regularisation for Recurrent Image Annotation
Feng Liu
Tao Xiang
Timothy M. Hospedales
Wankou Yang
Changyin Sun
29
103
0
16 Nov 2016
Boosting Image Captioning with Attributes
Ting Yao
Yingwei Pan
Yehao Li
Zhaofan Qiu
Tao Mei
VLM
33
620
0
05 Nov 2016
Seeing with Humans: Gaze-Assisted Neural Image Captioning
Yusuke Sugano
Andreas Bulling
18
68
0
18 Aug 2016
SPICE: Semantic Propositional Image Caption Evaluation
Peter Anderson
Basura Fernando
Mark Johnson
Stephen Gould
EGVM
34
1,883
0
29 Jul 2016
Movie Description
Anna Rohrbach
Atousa Torabi
Marcus Rohrbach
Niket Tandon
C. Pal
Hugo Larochelle
Aaron Courville
Bernt Schiele
3DV
VGen
32
353
0
12 May 2016
Visual Storytelling
Ting-Hao 'Kenneth' Huang
Huang
Francis Ferraro
N. Mostafazadeh
Ishan Misra
...
C. L. Zitnick
Devi Parikh
Lucy Vanderwende
Michel Galley
Margaret Mitchell
VGen
16
464
0
13 Apr 2016
Rich Image Captioning in the Wild
Kenneth Tran
Xiaodong He
Lei Zhang
Jian Sun
Cornelia Carapcea
Chris Thrasher
Chris Buehler
Chris Sienkiewicz
VLM
19
123
0
30 Mar 2016
Image Captioning and Visual Question Answering Based on Attributes and External Knowledge
Qi Wu
Chunhua Shen
Anton Van Den Hengel
Peng Wang
A. Dick
19
360
0
09 Mar 2016
Automatic Description Generation from Images: A Survey of Models, Datasets, and Evaluation Measures
Raffaella Bernardi
Ruken Cakici
Desmond Elliott
Aykut Erdem
Erkut Erdem
Nazli Ikizler-Cinbis
Frank Keller
A. Muscat
Barbara Plank
EGVM
VLM
21
363
0
15 Jan 2016
Deep Compositional Captioning: Describing Novel Object Categories without Paired Training Data
Lisa Anne Hendricks
Subhashini Venugopalan
Marcus Rohrbach
Raymond J. Mooney
Kate Saenko
Trevor Darrell
CoGe
16
284
0
17 Nov 2015
Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering
Huijuan Xu
Kate Saenko
22
760
0
17 Nov 2015
Describing Multimedia Content using Attention-based Encoder--Decoder Networks
Kyunghyun Cho
Aaron Courville
Yoshua Bengio
32
411
0
04 Jul 2015
Jointly Modeling Embedding and Translation to Bridge Video and Language
Yingwei Pan
Tao Mei
Ting Yao
Houqiang Li
Y. Rui
41
534
0
07 May 2015
Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)
Junhua Mao
Wenyuan Xu
Yi Yang
Jiang Wang
Zhiheng Huang
Alan Yuille
VLM
60
1,235
0
20 Dec 2014
Long-term Recurrent Convolutional Networks for Visual Recognition and Description
Jeff Donahue
Lisa Anne Hendricks
Marcus Rohrbach
Subhashini Venugopalan
S. Guadarrama
Kate Saenko
Trevor Darrell
VLM
55
6,032
0
17 Nov 2014
1