Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2408.15714
Cited By
Pixels to Prose: Understanding the art of Image Captioning
28 August 2024
Hrishikesh Singh
Aarti Sharma
Millie Pant
3DV
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Pixels to Prose: Understanding the art of Image Captioning"
50 / 56 papers shown
Title
Self-Supervised Image Captioning with CLIP
Chuanyang Jin
VLM
SSL
61
2
0
26 Jun 2023
Deep Learning Approaches on Image Captioning: A Review
Taraneh Ghandi
H. Pourreza
H. Mahyar
VLM
94
98
0
31 Jan 2022
Neural Attention for Image Captioning: Review of Outstanding Methods
Zanyar Zohourianshahzadi
Jugal Kalita
VLM
74
47
0
29 Nov 2021
Florence: A New Foundation Model for Computer Vision
Lu Yuan
Dongdong Chen
Yi-Ling Chen
Noel Codella
Xiyang Dai
...
Zhen Xiao
Jianwei Yang
Michael Zeng
Luowei Zhou
Pengchuan Zhang
VLM
125
904
0
22 Nov 2021
From Show to Tell: A Survey on Deep Learning-based Image Captioning
Matteo Stefanini
Marcella Cornia
Lorenzo Baraldi
S. Cascianelli
G. Fiameni
Rita Cucchiara
3DV
VLM
MLLM
109
269
0
14 Jul 2021
Multimodal Few-Shot Learning with Frozen Language Models
Maria Tsimpoukelli
Jacob Menick
Serkan Cabi
S. M. Ali Eslami
Oriol Vinyals
Felix Hill
MLLM
159
780
0
25 Jun 2021
Multi-modal Understanding and Generation for Medical Images and Text via Vision-Language Pre-Training
Jong Hak Moon
HyunGyung Lee
W. Shin
Young-Hak Kim
Edward Choi
MedIm
56
159
0
24 May 2021
VLM: Task-agnostic Video-Language Model Pre-training for Video Understanding
Hu Xu
Gargi Ghosh
Po-Yao (Bernie) Huang
Prahal Arora
Masoumeh Aminzadeh
Christoph Feichtenhofer
Florian Metze
Luke Zettlemoyer
45
132
0
20 May 2021
A Comprehensive Survey of Scene Graphs: Generation and Application
Xiaojun Chang
Pengzhen Ren
Pengfei Xu
Zhihui Li
Xiaojiang Chen
Alexander G. Hauptmann
3DV
93
232
0
17 Mar 2021
ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision
Wonjae Kim
Bokyung Son
Ildoo Kim
VLM
CLIP
119
1,745
0
05 Feb 2021
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
752
41,932
0
28 May 2020
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
Xiujun Li
Xi Yin
Chunyuan Li
Pengchuan Zhang
Xiaowei Hu
...
Houdong Hu
Li Dong
Furu Wei
Yejin Choi
Jianfeng Gao
VLM
105
1,938
0
13 Apr 2020
TextCaps: a Dataset for Image Captioning with Reading Comprehension
Oleksii Sidorov
Ronghang Hu
Marcus Rohrbach
Amanpreet Singh
82
413
0
24 Mar 2020
Normalized and Geometry-Aware Self-Attention Network for Image Captioning
Longteng Guo
Jing Liu
Xinxin Zhu
Peng Yao
Shichen Lu
Hanqing Lu
ViT
177
191
0
19 Mar 2020
Say As You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs
Shizhe Chen
Qin Jin
Peng Wang
Qi Wu
DiffM
94
216
0
01 Mar 2020
Visual Commonsense R-CNN
Tan Wang
Jianqiang Huang
Hanwang Zhang
Qianru Sun
SSL
ObjD
CML
57
249
0
27 Feb 2020
Captioning Images Taken by People Who Are Blind
Danna Gurari
Yinan Zhao
Meng Zhang
Nilavra Bhattacharya
61
182
0
20 Feb 2020
Meshed-Memory Transformer for Image Captioning
Marcella Cornia
Matteo Stefanini
Lorenzo Baraldi
Rita Cucchiara
70
880
0
17 Dec 2019
TPsgtR: Neural-Symbolic Tensor Product Scene-Graph-Triplet Representation for Image Captioning
C. Sur
54
13
0
22 Nov 2019
LXMERT: Learning Cross-Modality Encoder Representations from Transformers
Hao Hao Tan
Joey Tianyi Zhou
VLM
MLLM
237
2,479
0
20 Aug 2019
Attention on Attention for Image Captioning
Lun Huang
Wenmin Wang
Jie Chen
Xiao-Yong Wei
65
832
0
19 Aug 2019
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
SSL
VLM
226
3,678
0
06 Aug 2019
Automatic Radiology Report Generation based on Multi-view Image Fusion and Medical Concept Enrichment
Jianbo Yuan
Haofu Liao
R. Luo
Jiebo Luo
MedIm
73
194
0
22 Jul 2019
Image Captioning: Transforming Objects into Words
Simão Herdade
Armin Kappeler
K. Boakye
Joao Soares
ViT
114
470
0
14 Jun 2019
An Attentive Survey of Attention Models
S. Chaudhari
Varun Mithal
Gungor Polatkan
R. Ramanath
130
659
0
05 Apr 2019
Align2Ground: Weakly Supervised Phrase Grounding Guided by Image-Caption Alignment
Samyak Datta
Karan Sikka
Anirban Roy
Karuna Ahuja
Devi Parikh
Ajay Divakaran
55
104
0
27 Mar 2019
Unpaired Image Captioning via Scene Graph Alignments
Jiuxiang Gu
Shafiq Joty
Jianfei Cai
Handong Zhao
Xu Yang
G. Wang
GNN
64
174
0
26 Mar 2019
MIMIC-CXR-JPG, a large publicly available database of labeled chest radiographs
Alistair E. W. Johnson
Tom Pollard
Nathaniel R. Greenbaum
M. Lungren
Chih-ying Deng
Yifan Peng
Zhiyong Lu
R. Mark
Seth Berkowitz
Steven Horng
MedIm
94
810
0
21 Jan 2019
A Survey of the Recent Architectures of Deep Convolutional Neural Networks
Asifullah Khan
A. Sohail
Umme Zahoora
Aqsa Saeed Qureshi
OOD
104
2,302
0
17 Jan 2019
Auto-Encoding Scene Graphs for Image Captioning
Xu Yang
Kaihua Tang
Hanwang Zhang
Jianfei Cai
152
699
0
06 Dec 2018
Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
DiffM
65
175
0
26 Nov 2018
Engaging Image Captioning Via Personality
Kurt Shuster
Samuel Humeau
Hexiang Hu
Antoine Bordes
Jason Weston
72
152
0
25 Oct 2018
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
1.7K
94,770
0
11 Oct 2018
A Comprehensive Survey of Deep Learning for Image Captioning
Md Zakir Hossain
Ferdous Sohel
M. Shiratuddin
Hamid Laga
VLM
3DV
85
774
0
06 Oct 2018
Exploring Visual Relationship for Image Captioning
Ting Yao
Yingwei Pan
Yehao Li
Tao Mei
74
833
0
19 Sep 2018
Convolutional Image Captioning
J. Aneja
Aditya Deshpande
Alex Schwing
VLM
124
361
0
24 Nov 2017
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
Peter Anderson
Xiaodong He
Chris Buehler
Damien Teney
Mark Johnson
Stephen Gould
Lei Zhang
AIMat
121
4,215
0
25 Jul 2017
Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning
Jiasen Lu
Caiming Xiong
Devi Parikh
R. Socher
123
1,452
0
06 Dec 2016
Xception: Deep Learning with Depthwise Separable Convolutions
François Chollet
MDE
BDL
PINN
1.4K
14,559
0
07 Oct 2016
Densely Connected Convolutional Networks
Gao Huang
Zhuang Liu
Laurens van der Maaten
Kilian Q. Weinberger
PINN
3DV
772
36,794
0
25 Aug 2016
SPICE: Semantic Propositional Image Caption Evaluation
Peter Anderson
Basura Fernando
Mark Johnson
Stephen Gould
EGVM
102
1,914
0
29 Jul 2016
Generation and Comprehension of Unambiguous Object Descriptions
Junhua Mao
Jonathan Huang
Alexander Toshev
Oana-Maria Camburu
Alan Yuille
Kevin Patrick Murphy
ObjD
118
1,345
0
07 Nov 2015
SentiCap: Generating Image Descriptions with Sentiments
A. Mathews
Lexing Xie
Xuming He
80
221
0
06 Oct 2015
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
Shaoqing Ren
Kaiming He
Ross B. Girshick
Jian Sun
AIMat
ObjD
502
62,270
0
04 Jun 2015
What value do explicit high level concepts have in vision to language problems?
Qi Wu
Chunhua Shen
Lingqiao Liu
A. Dick
Anton Van Den Hengel
77
443
0
03 Jun 2015
Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models
Bryan A. Plummer
Liwei Wang
Christopher M. Cervantes
Juan C. Caicedo
Julia Hockenmaier
Svetlana Lazebnik
196
2,056
0
19 May 2015
Exploring Nearest Neighbor Approaches for Image Captioning
Jacob Devlin
Saurabh Gupta
Ross B. Girshick
Margaret Mitchell
C. L. Zitnick
177
195
0
17 May 2015
Fast R-CNN
Ross B. Girshick
ObjD
303
25,051
0
30 Apr 2015
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
Sergey Ioffe
Christian Szegedy
OOD
463
43,289
0
11 Feb 2015
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Ke Xu
Jimmy Ba
Ryan Kiros
Kyunghyun Cho
Aaron Courville
Ruslan Salakhutdinov
R. Zemel
Yoshua Bengio
DiffM
340
10,069
0
10 Feb 2015
1
2
Next