ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1505.01809
  4. Cited By
Language Models for Image Captioning: The Quirks and What Works

Language Models for Image Captioning: The Quirks and What Works

7 May 2015
Jacob Devlin
Hao Cheng
Hao Fang
Saurabh Gupta
Li Deng
Xiaodong He
Geoffrey Zweig
Margaret Mitchell
ArXivPDFHTML

Papers citing "Language Models for Image Captioning: The Quirks and What Works"

45 / 45 papers shown
Title
TopV: Compatible Token Pruning with Inference Time Optimization for Fast and Low-Memory Multimodal Vision Language Model
TopV: Compatible Token Pruning with Inference Time Optimization for Fast and Low-Memory Multimodal Vision Language Model
Cheng Yang
Yang Sui
Jinqi Xiao
Lingyi Huang
Yu Gong
...
Jinghua Yan
Y. Bai
P. Sadayappan
Xia Hu
Bo Yuan
VLM
59
0
0
24 Mar 2025
Standardizing the Measurement of Text Diversity: A Tool and a Comparative Analysis of Scores
Standardizing the Measurement of Text Diversity: A Tool and a Comparative Analysis of Scores
Chantal Shaib
Joe Barrow
Jiuding Sun
Alexa F. Siu
Byron C. Wallace
A. Nenkova
66
33
0
01 Mar 2024
Text-Only Training for Visual Storytelling
Text-Only Training for Visual Storytelling
Yuechen Wang
Wen-gang Zhou
Zhenbo Lu
Houqiang Li
DiffM
28
2
0
17 Aug 2023
Arithmetic Sampling: Parallel Diverse Decoding for Large Language Models
Arithmetic Sampling: Parallel Diverse Decoding for Large Language Models
Luke Vilnis
Yury Zemlyanskiy
Patrick C. Murray
Alexandre Passos
Sumit Sanghai
54
9
0
18 Oct 2022
It Isn't Sh!tposting, It's My CAT Posting
It Isn't Sh!tposting, It's My CAT Posting
Parthsarthi Rawat
Sayan Das
Jorge Aguirre
Akhil Daphara
ViT
17
0
0
18 May 2022
Multi-Glimpse Network: A Robust and Efficient Classification
  Architecture based on Recurrent Downsampled Attention
Multi-Glimpse Network: A Robust and Efficient Classification Architecture based on Recurrent Downsampled Attention
S. Tan
Runpei Dong
Kaisheng Ma
22
2
0
03 Nov 2021
Multi-Modal Image Captioning for the Visually Impaired
Multi-Modal Image Captioning for the Visually Impaired
Hiba Ahsan
Nikita Bhalla
Daivat Bhatt
Kaivankumar Shah
17
20
0
17 May 2021
Every Model Learned by Gradient Descent Is Approximately a Kernel
  Machine
Every Model Learned by Gradient Descent Is Approximately a Kernel Machine
Pedro M. Domingos
MLT
29
70
0
30 Nov 2020
Gaussian Smoothen Semantic Features (GSSF) -- Exploring the Linguistic
  Aspects of Visual Captioning in Indian Languages (Bengali) Using MSCOCO
  Framework
Gaussian Smoothen Semantic Features (GSSF) -- Exploring the Linguistic Aspects of Visual Captioning in Indian Languages (Bengali) Using MSCOCO Framework
C. Sur
19
7
0
16 Feb 2020
MRRC: Multiple Role Representation Crossover Interpretation for Image
  Captioning With R-CNN Feature Distribution Composition (FDC)
MRRC: Multiple Role Representation Crossover Interpretation for Image Captioning With R-CNN Feature Distribution Composition (FDC)
C. Sur
25
16
0
15 Feb 2020
Going Beneath the Surface: Evaluating Image Captioning for
  Grammaticality, Truthfulness and Diversity
Going Beneath the Surface: Evaluating Image Captioning for Grammaticality, Truthfulness and Diversity
Huiyuan Xie
Tom Sherborne
A. Kuhnle
Ann A. Copestake
DiffM
17
9
0
19 Dec 2019
Does BERT Make Any Sense? Interpretable Word Sense Disambiguation with
  Contextualized Embeddings
Does BERT Make Any Sense? Interpretable Word Sense Disambiguation with Contextualized Embeddings
Gregor Wiedemann
Steffen Remus
Avi Chawla
Chris Biemann
19
174
0
23 Sep 2019
Learning Visual Relation Priors for Image-Text Matching and Image
  Captioning with Neural Scene Graph Generators
Learning Visual Relation Priors for Image-Text Matching and Image Captioning with Neural Scene Graph Generators
Kuang-Huei Lee
Hamid Palangi
Xi Chen
Houdong Hu
Jianfeng Gao
VLM
19
37
0
22 Sep 2019
Compositional Generalization in Image Captioning
Compositional Generalization in Image Captioning
Mitja Nikolaus
Mostafa Abdou
Matthew Lamm
Rahul Aralikatte
Desmond Elliott
CoGe
21
49
0
10 Sep 2019
MeetUp! A Corpus of Joint Activity Dialogues in a Visual Environment
MeetUp! A Corpus of Joint Activity Dialogues in a Visual Environment
N. Ilinykh
Sina Zarrieß
David Schlangen
19
43
0
11 Jul 2019
Sequence-to-Sequence Models for Data-to-Text Natural Language
  Generation: Word- vs. Character-based Processing and Output Diversity
Sequence-to-Sequence Models for Data-to-Text Natural Language Generation: Word- vs. Character-based Processing and Output Diversity
Glorianna Jagfeld
Sabrina Jenne
Ngoc Thang Vu
AIMat
33
24
0
11 Oct 2018
A Comprehensive Survey of Deep Learning for Image Captioning
A Comprehensive Survey of Deep Learning for Image Captioning
Md. Zakir Hossain
Ferdous Sohel
M. Shiratuddin
Hamid Laga
VLM
3DV
28
760
0
06 Oct 2018
Stacked Cross Attention for Image-Text Matching
Stacked Cross Attention for Image-Text Matching
Kuang-Huei Lee
Xi Chen
G. Hua
Houdong Hu
Xiaodong He
13
1,140
0
21 Mar 2018
Neural Aesthetic Image Reviewer
Neural Aesthetic Image Reviewer
Wenshan Wang
Su Yang
Weishan Zhang
Jiulong Zhang
19
38
0
28 Feb 2018
Attacking Visual Language Grounding with Adversarial Examples: A Case
  Study on Neural Image Captioning
Attacking Visual Language Grounding with Adversarial Examples: A Case Study on Neural Image Captioning
Hongge Chen
Huan Zhang
Pin-Yu Chen
Jinfeng Yi
Cho-Jui Hsieh
GAN
AAML
27
49
0
06 Dec 2017
Diverse and Accurate Image Description Using a Variational Auto-Encoder
  with an Additive Gaussian Encoding Space
Diverse and Accurate Image Description Using a Variational Auto-Encoder with an Additive Gaussian Encoding Space
Liwei Wang
A. Schwing
Svetlana Lazebnik
CoGe
24
175
0
19 Nov 2017
AI Challenger : A Large-scale Dataset for Going Deeper in Image
  Understanding
AI Challenger : A Large-scale Dataset for Going Deeper in Image Understanding
Jiahong Wu
He Zheng
Bo-Lu Zhao
Yixin Li
Baoming Yan
...
Shipei Zhou
G. Lin
Yanwei Fu
Yizhou Wang
Yonggang Wang
VLM
30
149
0
17 Nov 2017
Self-Guiding Multimodal LSTM - when we do not have a perfect training
  dataset for image captioning
Self-Guiding Multimodal LSTM - when we do not have a perfect training dataset for image captioning
Yang Xian
Yingli Tian
VLM
23
22
0
15 Sep 2017
Multimodal Machine Learning: A Survey and Taxonomy
Multimodal Machine Learning: A Survey and Taxonomy
T. Baltrušaitis
Chaitanya Ahuja
Louis-Philippe Morency
15
2,859
0
26 May 2017
Towards a Visual Privacy Advisor: Understanding and Predicting Privacy
  Risks in Images
Towards a Visual Privacy Advisor: Understanding and Predicting Privacy Risks in Images
Rakshith Shetty
Bernt Schiele
Mario Fritz
27
223
0
30 Mar 2017
Survey of the State of the Art in Natural Language Generation: Core
  tasks, applications and evaluation
Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation
Albert Gatt
E. Krahmer
LM&MA
ELM
27
808
0
29 Mar 2017
Where to put the Image in an Image Caption Generator
Where to put the Image in an Image Caption Generator
Marc Tanti
Albert Gatt
K. Camilleri
39
96
0
27 Mar 2017
Recurrent Models for Situation Recognition
Recurrent Models for Situation Recognition
Arun Mallya
Svetlana Lazebnik
12
30
0
18 Mar 2017
MAT: A Multimodal Attentive Translator for Image Captioning
MAT: A Multimodal Attentive Translator for Image Captioning
Chang Liu
F. Sun
Changhu Wang
Feng Wang
Alan Yuille
12
58
0
18 Feb 2017
Guided Open Vocabulary Image Captioning with Constrained Beam Search
Guided Open Vocabulary Image Captioning with Constrained Beam Search
Peter Anderson
Basura Fernando
Mark Johnson
Stephen Gould
21
232
0
02 Dec 2016
Semantic Regularisation for Recurrent Image Annotation
Semantic Regularisation for Recurrent Image Annotation
Feng Liu
Tao Xiang
Timothy M. Hospedales
Wankou Yang
Changyin Sun
29
103
0
16 Nov 2016
Boosting Image Captioning with Attributes
Boosting Image Captioning with Attributes
Ting Yao
Yingwei Pan
Yehao Li
Zhaofan Qiu
Tao Mei
VLM
31
620
0
05 Nov 2016
Seeing with Humans: Gaze-Assisted Neural Image Captioning
Seeing with Humans: Gaze-Assisted Neural Image Captioning
Yusuke Sugano
Andreas Bulling
16
68
0
18 Aug 2016
SPICE: Semantic Propositional Image Caption Evaluation
SPICE: Semantic Propositional Image Caption Evaluation
Peter Anderson
Basura Fernando
Mark Johnson
Stephen Gould
EGVM
34
1,883
0
29 Jul 2016
Movie Description
Movie Description
Anna Rohrbach
Atousa Torabi
Marcus Rohrbach
Niket Tandon
C. Pal
Hugo Larochelle
Aaron Courville
Bernt Schiele
3DV
VGen
32
353
0
12 May 2016
Visual Storytelling
Visual Storytelling
Ting-Hao 'Kenneth' Huang
Huang
Francis Ferraro
N. Mostafazadeh
Ishan Misra
...
C. L. Zitnick
Devi Parikh
Lucy Vanderwende
Michel Galley
Margaret Mitchell
VGen
11
464
0
13 Apr 2016
Rich Image Captioning in the Wild
Rich Image Captioning in the Wild
Kenneth Tran
Xiaodong He
Lei Zhang
Jian Sun
Cornelia Carapcea
Chris Thrasher
Chris Buehler
Chris Sienkiewicz
VLM
19
123
0
30 Mar 2016
Image Captioning and Visual Question Answering Based on Attributes and
  External Knowledge
Image Captioning and Visual Question Answering Based on Attributes and External Knowledge
Qi Wu
Chunhua Shen
A. Hengel
Peng Wang
A. Dick
19
360
0
09 Mar 2016
Automatic Description Generation from Images: A Survey of Models,
  Datasets, and Evaluation Measures
Automatic Description Generation from Images: A Survey of Models, Datasets, and Evaluation Measures
Raffaella Bernardi
Ruken Cakici
Desmond Elliott
Aykut Erdem
Erkut Erdem
Nazli Ikizler-Cinbis
Frank Keller
A. Muscat
Barbara Plank
EGVM
VLM
19
363
0
15 Jan 2016
Deep Compositional Captioning: Describing Novel Object Categories
  without Paired Training Data
Deep Compositional Captioning: Describing Novel Object Categories without Paired Training Data
Lisa Anne Hendricks
Subhashini Venugopalan
Marcus Rohrbach
Raymond J. Mooney
Kate Saenko
Trevor Darrell
CoGe
16
284
0
17 Nov 2015
Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for
  Visual Question Answering
Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering
Huijuan Xu
Kate Saenko
22
760
0
17 Nov 2015
Describing Multimedia Content using Attention-based Encoder--Decoder
  Networks
Describing Multimedia Content using Attention-based Encoder--Decoder Networks
Kyunghyun Cho
Aaron Courville
Yoshua Bengio
32
411
0
04 Jul 2015
Jointly Modeling Embedding and Translation to Bridge Video and Language
Jointly Modeling Embedding and Translation to Bridge Video and Language
Yingwei Pan
Tao Mei
Ting Yao
Houqiang Li
Y. Rui
38
534
0
07 May 2015
Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)
Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)
Junhua Mao
W. Xu
Yi Yang
Jiang Wang
Zhiheng Huang
Alan Yuille
VLM
60
1,235
0
20 Dec 2014
Long-term Recurrent Convolutional Networks for Visual Recognition and
  Description
Long-term Recurrent Convolutional Networks for Visual Recognition and Description
Jeff Donahue
Lisa Anne Hendricks
Marcus Rohrbach
Subhashini Venugopalan
S. Guadarrama
Kate Saenko
Trevor Darrell
VLM
50
6,032
0
17 Nov 2014
1