Language Models for Image Captioning: The Quirks and What Works

7 May 2015

Li Deng

Papers citing "Language Models for Image Captioning: The Quirks and What Works"

45 / 45 papers shown

Title
TopV: Compatible Token Pruning with Inference Time Optimization for Fast and Low-Memory Multimodal Vision Language Model Cheng Yang Yang Sui Jinqi Xiao Lingyi Huang Yu Gong ... Jinghua Yan Y. Bai P. Sadayappan Xia Hu Bo Yuan VLM 59 0 0 24 Mar 2025
Standardizing the Measurement of Text Diversity: A Tool and a Comparative Analysis of Scores Chantal Shaib Joe Barrow Jiuding Sun Alexa F. Siu Byron C. Wallace A. Nenkova 66 33 0 01 Mar 2024
Text-Only Training for Visual Storytelling Yuechen Wang Wen-gang Zhou Zhenbo Lu Houqiang Li DiffM 28 2 0 17 Aug 2023
Arithmetic Sampling: Parallel Diverse Decoding for Large Language Models Luke Vilnis Yury Zemlyanskiy Patrick C. Murray Alexandre Passos Sumit Sanghai 59 9 0 18 Oct 2022
It Isn't Sh!tposting, It's My CAT Posting Parthsarthi Rawat Sayan Das Jorge Aguirre Akhil Daphara ViT 22 0 0 18 May 2022
Multi-Glimpse Network: A Robust and Efficient Classification Architecture based on Recurrent Downsampled Attention S. Tan Runpei Dong Kaisheng Ma 22 2 0 03 Nov 2021
Multi-Modal Image Captioning for the Visually Impaired Hiba Ahsan Nikita Bhalla Daivat Bhatt Kaivankumar Shah 22 20 0 17 May 2021
Every Model Learned by Gradient Descent Is Approximately a Kernel Machine Pedro M. Domingos MLT 29 70 0 30 Nov 2020
Gaussian Smoothen Semantic Features (GSSF) -- Exploring the Linguistic Aspects of Visual Captioning in Indian Languages (Bengali) Using MSCOCO Framework C. Sur 24 7 0 16 Feb 2020
MRRC: Multiple Role Representation Crossover Interpretation for Image Captioning With R-CNN Feature Distribution Composition (FDC) C. Sur 25 16 0 15 Feb 2020
Going Beneath the Surface: Evaluating Image Captioning for Grammaticality, Truthfulness and Diversity Huiyuan Xie Tom Sherborne A. Kuhnle Ann A. Copestake DiffM 22 9 0 19 Dec 2019
Does BERT Make Any Sense? Interpretable Word Sense Disambiguation with Contextualized Embeddings Gregor Wiedemann Steffen Remus Avi Chawla Chris Biemann 19 174 0 23 Sep 2019
Learning Visual Relation Priors for Image-Text Matching and Image Captioning with Neural Scene Graph Generators Kuang-Huei Lee Hamid Palangi Xi Chen Houdong Hu Jianfeng Gao VLM 24 37 0 22 Sep 2019
Compositional Generalization in Image Captioning Mitja Nikolaus Mostafa Abdou Matthew Lamm Rahul Aralikatte Desmond Elliott CoGe 24 49 0 10 Sep 2019
MeetUp! A Corpus of Joint Activity Dialogues in a Visual Environment N. Ilinykh Sina Zarrieß David Schlangen 24 43 0 11 Jul 2019
Sequence-to-Sequence Models for Data-to-Text Natural Language Generation: Word- vs. Character-based Processing and Output Diversity Glorianna Jagfeld Sabrina Jenne Ngoc Thang Vu AIMat 38 24 0 11 Oct 2018
A Comprehensive Survey of Deep Learning for Image Captioning Md. Zakir Hossain Ferdous Sohel M. Shiratuddin Hamid Laga VLM 3DV 33 760 0 06 Oct 2018
Stacked Cross Attention for Image-Text Matching Kuang-Huei Lee Xi Chen G. Hua Houdong Hu Xiaodong He 15 1,140 0 21 Mar 2018
Neural Aesthetic Image Reviewer Wenshan Wang Su Yang Weishan Zhang Jiulong Zhang 19 38 0 28 Feb 2018
Attacking Visual Language Grounding with Adversarial Examples: A Case Study on Neural Image Captioning Hongge Chen Huan Zhang Pin-Yu Chen Jinfeng Yi Cho-Jui Hsieh GAN AAML 29 49 0 06 Dec 2017
Diverse and Accurate Image Description Using a Variational Auto-Encoder with an Additive Gaussian Encoding Space Liwei Wang A. Schwing Svetlana Lazebnik CoGe 31 175 0 19 Nov 2017
AI Challenger : A Large-scale Dataset for Going Deeper in Image Understanding Jiahong Wu He Zheng Bo-Lu Zhao Yixin Li Baoming Yan ... Shipei Zhou G. Lin Yanwei Fu Yizhou Wang Yonggang Wang VLM 32 149 0 17 Nov 2017
Self-Guiding Multimodal LSTM - when we do not have a perfect training dataset for image captioning Yang Xian Yingli Tian VLM 25 22 0 15 Sep 2017
Multimodal Machine Learning: A Survey and Taxonomy T. Baltrušaitis Chaitanya Ahuja Louis-Philippe Morency 15 2,859 0 26 May 2017
Towards a Visual Privacy Advisor: Understanding and Predicting Privacy Risks in Images Rakshith Shetty Bernt Schiele Mario Fritz 32 223 0 30 Mar 2017
Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation Albert Gatt E. Krahmer LM&MA ELM 27 810 0 29 Mar 2017
Where to put the Image in an Image Caption Generator Marc Tanti Albert Gatt K. Camilleri 44 96 0 27 Mar 2017
Recurrent Models for Situation Recognition Arun Mallya Svetlana Lazebnik 14 30 0 18 Mar 2017
MAT: A Multimodal Attentive Translator for Image Captioning Chang Liu F. Sun Changhu Wang Feng Wang Alan Yuille 17 58 0 18 Feb 2017
Guided Open Vocabulary Image Captioning with Constrained Beam Search Peter Anderson Basura Fernando Mark Johnson Stephen Gould 21 232 0 02 Dec 2016
Semantic Regularisation for Recurrent Image Annotation Feng Liu Tao Xiang Timothy M. Hospedales Wankou Yang Changyin Sun 29 103 0 16 Nov 2016
Boosting Image Captioning with Attributes Ting Yao Yingwei Pan Yehao Li Zhaofan Qiu Tao Mei VLM 33 620 0 05 Nov 2016
Seeing with Humans: Gaze-Assisted Neural Image Captioning Yusuke Sugano Andreas Bulling 18 68 0 18 Aug 2016
SPICE: Semantic Propositional Image Caption Evaluation Peter Anderson Basura Fernando Mark Johnson Stephen Gould EGVM 34 1,883 0 29 Jul 2016
Movie Description Anna Rohrbach Atousa Torabi Marcus Rohrbach Niket Tandon C. Pal Hugo Larochelle Aaron Courville Bernt Schiele 3DV VGen 32 353 0 12 May 2016
Visual Storytelling Ting-Hao 'Kenneth' Huang Huang Francis Ferraro N. Mostafazadeh Ishan Misra ... C. L. Zitnick Devi Parikh Lucy Vanderwende Michel Galley Margaret Mitchell VGen 16 464 0 13 Apr 2016
Rich Image Captioning in the Wild Kenneth Tran Xiaodong He Lei Zhang Jian Sun Cornelia Carapcea Chris Thrasher Chris Buehler Chris Sienkiewicz VLM 19 123 0 30 Mar 2016
Image Captioning and Visual Question Answering Based on Attributes and External Knowledge Qi Wu Chunhua Shen Anton Van Den Hengel Peng Wang A. Dick 19 360 0 09 Mar 2016
Automatic Description Generation from Images: A Survey of Models, Datasets, and Evaluation Measures Raffaella Bernardi Ruken Cakici Desmond Elliott Aykut Erdem Erkut Erdem Nazli Ikizler-Cinbis Frank Keller A. Muscat Barbara Plank EGVM VLM 21 363 0 15 Jan 2016
Deep Compositional Captioning: Describing Novel Object Categories without Paired Training Data Lisa Anne Hendricks Subhashini Venugopalan Marcus Rohrbach Raymond J. Mooney Kate Saenko Trevor Darrell CoGe 16 284 0 17 Nov 2015
Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering Huijuan Xu Kate Saenko 22 760 0 17 Nov 2015
Describing Multimedia Content using Attention-based Encoder--Decoder Networks Kyunghyun Cho Aaron Courville Yoshua Bengio 32 411 0 04 Jul 2015
Jointly Modeling Embedding and Translation to Bridge Video and Language Yingwei Pan Tao Mei Ting Yao Houqiang Li Y. Rui 41 534 0 07 May 2015
Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN) Junhua Mao Wenyuan Xu Yi Yang Jiang Wang Zhiheng Huang Alan Yuille VLM 60 1,235 0 20 Dec 2014
Long-term Recurrent Convolutional Networks for Visual Recognition and Description Jeff Donahue Lisa Anne Hendricks Marcus Rohrbach Subhashini Venugopalan S. Guadarrama Kate Saenko Trevor Darrell VLM 55 6,032 0 17 Nov 2014