Pixels to Prose: Understanding the art of Image Captioning

28 August 2024

Hrishikesh Singh

Aarti Sharma

Papers citing "Pixels to Prose: Understanding the art of Image Captioning"

50 / 56 papers shown

Title
Self-Supervised Image Captioning with CLIP Chuanyang Jin VLM SSL 61 2 0 26 Jun 2023
Deep Learning Approaches on Image Captioning: A Review Taraneh Ghandi H. Pourreza H. Mahyar VLM 94 98 0 31 Jan 2022
Neural Attention for Image Captioning: Review of Outstanding Methods Zanyar Zohourianshahzadi Jugal Kalita VLM 74 47 0 29 Nov 2021
Florence: A New Foundation Model for Computer Vision Lu Yuan Dongdong Chen Yi-Ling Chen Noel Codella Xiyang Dai ... Zhen Xiao Jianwei Yang Michael Zeng Luowei Zhou Pengchuan Zhang VLM 125 904 0 22 Nov 2021
From Show to Tell: A Survey on Deep Learning-based Image Captioning Matteo Stefanini Marcella Cornia Lorenzo Baraldi S. Cascianelli G. Fiameni Rita Cucchiara 3DV VLM MLLM 109 269 0 14 Jul 2021
Multimodal Few-Shot Learning with Frozen Language Models Maria Tsimpoukelli Jacob Menick Serkan Cabi S. M. Ali Eslami Oriol Vinyals Felix Hill MLLM 159 780 0 25 Jun 2021
Multi-modal Understanding and Generation for Medical Images and Text via Vision-Language Pre-Training Jong Hak Moon HyunGyung Lee W. Shin Young-Hak Kim Edward Choi MedIm 56 159 0 24 May 2021
VLM: Task-agnostic Video-Language Model Pre-training for Video Understanding Hu Xu Gargi Ghosh Po-Yao (Bernie) Huang Prahal Arora Masoumeh Aminzadeh Christoph Feichtenhofer Florian Metze Luke Zettlemoyer 45 132 0 20 May 2021
A Comprehensive Survey of Scene Graphs: Generation and Application Xiaojun Chang Pengzhen Ren Pengfei Xu Zhihui Li Xiaojiang Chen Alexander G. Hauptmann 3DV 93 232 0 17 Mar 2021
ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision Wonjae Kim Bokyung Son Ildoo Kim VLM CLIP 119 1,745 0 05 Feb 2021
Language Models are Few-Shot Learners Tom B. Brown Benjamin Mann Nick Ryder Melanie Subbiah Jared Kaplan ... Christopher Berner Sam McCandlish Alec Radford Ilya Sutskever Dario Amodei BDL 752 41,932 0 28 May 2020
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks Xiujun Li Xi Yin Chunyuan Li Pengchuan Zhang Xiaowei Hu ... Houdong Hu Li Dong Furu Wei Yejin Choi Jianfeng Gao VLM 105 1,938 0 13 Apr 2020
TextCaps: a Dataset for Image Captioning with Reading Comprehension Oleksii Sidorov Ronghang Hu Marcus Rohrbach Amanpreet Singh 82 413 0 24 Mar 2020
Normalized and Geometry-Aware Self-Attention Network for Image Captioning Longteng Guo Jing Liu Xinxin Zhu Peng Yao Shichen Lu Hanqing Lu ViT 177 191 0 19 Mar 2020
Say As You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs Shizhe Chen Qin Jin Peng Wang Qi Wu DiffM 94 216 0 01 Mar 2020
Visual Commonsense R-CNN Tan Wang Jianqiang Huang Hanwang Zhang Qianru Sun SSL ObjD CML 57 249 0 27 Feb 2020
Captioning Images Taken by People Who Are Blind Danna Gurari Yinan Zhao Meng Zhang Nilavra Bhattacharya 61 182 0 20 Feb 2020
Meshed-Memory Transformer for Image Captioning Marcella Cornia Matteo Stefanini Lorenzo Baraldi Rita Cucchiara 70 880 0 17 Dec 2019
TPsgtR: Neural-Symbolic Tensor Product Scene-Graph-Triplet Representation for Image Captioning C. Sur 54 13 0 22 Nov 2019
LXMERT: Learning Cross-Modality Encoder Representations from Transformers Hao Hao Tan Joey Tianyi Zhou VLM MLLM 237 2,479 0 20 Aug 2019
Attention on Attention for Image Captioning Lun Huang Wenmin Wang Jie Chen Xiao-Yong Wei 65 832 0 19 Aug 2019
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks Jiasen Lu Dhruv Batra Devi Parikh Stefan Lee SSL VLM 226 3,678 0 06 Aug 2019
Automatic Radiology Report Generation based on Multi-view Image Fusion and Medical Concept Enrichment Jianbo Yuan Haofu Liao R. Luo Jiebo Luo MedIm 73 194 0 22 Jul 2019
Image Captioning: Transforming Objects into Words Simão Herdade Armin Kappeler K. Boakye Joao Soares ViT 114 470 0 14 Jun 2019
An Attentive Survey of Attention Models S. Chaudhari Varun Mithal Gungor Polatkan R. Ramanath 130 659 0 05 Apr 2019
Align2Ground: Weakly Supervised Phrase Grounding Guided by Image-Caption Alignment Samyak Datta Karan Sikka Anirban Roy Karuna Ahuja Devi Parikh Ajay Divakaran 55 104 0 27 Mar 2019
Unpaired Image Captioning via Scene Graph Alignments Jiuxiang Gu Shafiq Joty Jianfei Cai Handong Zhao Xu Yang G. Wang GNN 64 174 0 26 Mar 2019
MIMIC-CXR-JPG, a large publicly available database of labeled chest radiographs Alistair E. W. Johnson Tom Pollard Nathaniel R. Greenbaum M. Lungren Chih-ying Deng Yifan Peng Zhiyong Lu R. Mark Seth Berkowitz Steven Horng MedIm 94 810 0 21 Jan 2019
A Survey of the Recent Architectures of Deep Convolutional Neural Networks Asifullah Khan A. Sohail Umme Zahoora Aqsa Saeed Qureshi OOD 104 2,302 0 17 Jan 2019
Auto-Encoding Scene Graphs for Image Captioning Xu Yang Kaihua Tang Hanwang Zhang Jianfei Cai 152 699 0 06 Dec 2018
Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions Marcella Cornia Lorenzo Baraldi Rita Cucchiara DiffM 65 175 0 26 Nov 2018
Engaging Image Captioning Via Personality Kurt Shuster Samuel Humeau Hexiang Hu Antoine Bordes Jason Weston 72 152 0 25 Oct 2018
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Jacob Devlin Ming-Wei Chang Kenton Lee Kristina Toutanova VLM SSL SSeg 1.7K 94,770 0 11 Oct 2018
A Comprehensive Survey of Deep Learning for Image Captioning Md Zakir Hossain Ferdous Sohel M. Shiratuddin Hamid Laga VLM 3DV 85 774 0 06 Oct 2018
Exploring Visual Relationship for Image Captioning Ting Yao Yingwei Pan Yehao Li Tao Mei 74 833 0 19 Sep 2018
Convolutional Image Captioning J. Aneja Aditya Deshpande Alex Schwing VLM 124 361 0 24 Nov 2017
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering Peter Anderson Xiaodong He Chris Buehler Damien Teney Mark Johnson Stephen Gould Lei Zhang AIMat 121 4,215 0 25 Jul 2017
Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning Jiasen Lu Caiming Xiong Devi Parikh R. Socher 123 1,452 0 06 Dec 2016
Xception: Deep Learning with Depthwise Separable Convolutions François Chollet MDE BDL PINN 1.4K 14,559 0 07 Oct 2016
Densely Connected Convolutional Networks Gao Huang Zhuang Liu Laurens van der Maaten Kilian Q. Weinberger PINN 3DV 772 36,794 0 25 Aug 2016
SPICE: Semantic Propositional Image Caption Evaluation Peter Anderson Basura Fernando Mark Johnson Stephen Gould EGVM 102 1,914 0 29 Jul 2016
Generation and Comprehension of Unambiguous Object Descriptions Junhua Mao Jonathan Huang Alexander Toshev Oana-Maria Camburu Alan Yuille Kevin Patrick Murphy ObjD 118 1,345 0 07 Nov 2015
SentiCap: Generating Image Descriptions with Sentiments A. Mathews Lexing Xie Xuming He 80 221 0 06 Oct 2015
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Shaoqing Ren Kaiming He Ross B. Girshick Jian Sun AIMat ObjD 502 62,270 0 04 Jun 2015
What value do explicit high level concepts have in vision to language problems? Qi Wu Chunhua Shen Lingqiao Liu A. Dick Anton Van Den Hengel 77 443 0 03 Jun 2015
Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models Bryan A. Plummer Liwei Wang Christopher M. Cervantes Juan C. Caicedo Julia Hockenmaier Svetlana Lazebnik 196 2,056 0 19 May 2015
Exploring Nearest Neighbor Approaches for Image Captioning Jacob Devlin Saurabh Gupta Ross B. Girshick Margaret Mitchell C. L. Zitnick 177 195 0 17 May 2015
Fast R-CNN Ross B. Girshick ObjD 303 25,051 0 30 Apr 2015
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift Sergey Ioffe Christian Szegedy OOD 463 43,289 0 11 Feb 2015
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention Ke Xu Jimmy Ba Ryan Kiros Kyunghyun Cho Aaron Courville Ruslan Salakhutdinov R. Zemel Yoshua Bengio DiffM 340 10,069 0 10 Feb 2015