Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1411.4555
Cited By
Show and Tell: A Neural Image Caption Generator
17 November 2014
Oriol Vinyals
Alexander Toshev
Samy Bengio
D. Erhan
3DV
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Show and Tell: A Neural Image Caption Generator"
50 / 2,022 papers shown
Title
Fact-Checking of AI-Generated Reports
Razi Mahmood
Ge Wang
Mannudeep Kalra
Pingkun Yan
MedIm
26
7
0
27 Jul 2023
Enhancing image captioning with depth information using a Transformer-based framework
Aya Mahmoud Ahmed
Mohamed Yousef
K. Hussain
Yousef B. Mahdy
ViT
29
4
0
24 Jul 2023
EnTri: Ensemble Learning with Tri-level Representations for Explainable Scene Recognition
Amirhossein Aminimehr
Amir Molaei
Min Zhang
33
1
0
23 Jul 2023
OxfordTVG-HIC: Can Machine Make Humorous Captions from Images?
Runjia Li
Shuyang Sun
Mohamed Elhoseiny
Philip Torr
36
11
0
21 Jul 2023
TbExplain: A Text-based Explanation Method for Scene Classification Models with the Statistical Prediction Correction
Amirhossein Aminimehr
Pouya Khani
Amir Molaei
Amirmohammad Kazemeini
Min Zhang
FAtt
24
5
0
19 Jul 2023
Embedded Heterogeneous Attention Transformer for Cross-lingual Image Captioning
Zijie Song
Zhenzhen Hu
Yuanen Zhou
Ye Zhao
Richang Hong
Meng Wang
21
2
0
19 Jul 2023
GenAssist: Making Image Generation Accessible
Mina Huh
Yi-Hao Peng
Amy Pavel
DiffM
25
29
0
14 Jul 2023
AIC-AB NET: A Neural Network for Image Captioning with Spatial Attention and Text Attributes
Guoyun Tu
Ying Liu
Vladimir Vlassov
139
1
0
14 Jul 2023
Reading Radiology Imaging Like The Radiologist
Yuhao Wang
MedIm
34
0
0
12 Jul 2023
Linear Alignment of Vision-language Models for Image Captioning
Fabian Paischer
M. Hofmarcher
Sepp Hochreiter
Thomas Adler
CLIP
VLM
50
0
0
10 Jul 2023
MMNet: Multi-Collaboration and Multi-Supervision Network for Sequential Deepfake Detection
Ruiyang Xia
Decheng Liu
Jie Li
Lin Yuan
N. Wang
Xinbo Gao
28
17
0
06 Jul 2023
VisText: A Benchmark for Semantically Rich Chart Captioning
Benny J. Tang
Angie Boggust
Arvind Satyanarayan
31
76
0
28 Jun 2023
Self-Supervised Image Captioning with CLIP
Chuanyang Jin
VLM
SSL
26
2
0
26 Jun 2023
Improving Reference-based Distinctive Image Captioning with Contrastive Rewards
Yangjun Mao
Jun Xiao
Dong Zhang
Meng Cao
Jian Shao
Yueting Zhuang
Long Chen
EGVM
32
9
0
25 Jun 2023
Learning Descriptive Image Captioning via Semipermeable Maximum Likelihood Estimation
Zihao Yue
Anwen Hu
Liang Zhang
Qin Jin
24
2
0
23 Jun 2023
KiUT: Knowledge-injected U-Transformer for Radiology Report Generation
Zhongzhen Huang
Xiaofan Zhang
Shaoting Zhang
MedIm
25
51
0
20 Jun 2023
Replace and Report: NLP Assisted Radiology Report Generation
Kaveri Kale
P. Bhattacharyya
Kshitij Sharad Jadhav
LM&MA
MedIm
14
11
0
19 Jun 2023
Generation of Radiology Findings in Chest X-Ray by Leveraging Collaborative Knowledge
Manuela Danu
George Marica
Sanjeev Kumar Karn
Bogdan Georgescu
Awais Mansoor
...
Lucian Mihai Itu
C. Suciu
Sasa Grbic
Oladimeji Farri
Dorin Comaniciu
MedIm
23
8
0
18 Jun 2023
Towards AGI in Computer Vision: Lessons Learned from GPT and Large Language Models
Lingxi Xie
Longhui Wei
Xiaopeng Zhang
Kaifeng Bi
Xiaotao Gu
Jianlong Chang
Qi Tian
38
7
0
14 Jun 2023
Scalable 3D Captioning with Pretrained Models
Tiange Luo
C. Rockwell
Honglak Lee
Justin Johnson
26
152
0
12 Jun 2023
ORGAN: Observation-Guided Radiology Report Generation via Tree Reasoning
Wenjun Hou
Kaishuai Xu
Yi Cheng
Wenjie Li
Jiangming Liu
19
33
0
10 Jun 2023
Object Detection with Transformers: A Review
Tahira Shehzadi
K. Hashmi
D. Stricker
Muhammad Zeshan Afzal
ViT
MU
23
28
0
07 Jun 2023
Too Large; Data Reduction for Vision-Language Pre-Training
Alex Jinpeng Wang
Kevin Qinghong Lin
David Junhao Zhang
Stan Weixian Lei
Mike Zheng Shou
VLM
33
24
0
31 May 2023
Learning without Forgetting for Vision-Language Models
Da-Wei Zhou
Yuanhan Zhang
Jingyi Ning
Jingyi Ning
De-Chuan Zhan
De-Chuan Zhan
Ziwei Liu
VLM
CLL
74
37
0
30 May 2023
Image Captioning with Multi-Context Synthetic Data
Feipeng Ma
Y. Zhou
Fengyun Rao
Yueyi Zhang
Xiaoyan Sun
DiffM
33
7
0
29 May 2023
FuseCap: Leveraging Large Language Models for Enriched Fused Image Captions
Noam Rotstein
David Bensaid
Shaked Brody
Roy Ganz
Ron Kimmel
VLM
26
27
0
28 May 2023
CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers
Dachuan Shi
Chaofan Tao
Anyi Rao
Zhendong Yang
Chun Yuan
Jiaqi Wang
VLM
34
22
0
27 May 2023
S4M: Generating Radiology Reports by A Single Model for Multiple Body Parts
Qi Chen
Yutong Xie
Biao Wu
Minh Nguyen Nhat To
James Ang
Qi Wu
13
1
0
26 May 2023
HAAV: Hierarchical Aggregation of Augmented Views for Image Captioning
Chia-Wen Kuo
Z. Kira
37
21
0
25 May 2023
Meta-learning For Vision-and-language Cross-lingual Transfer
Hanxu Hu
Frank Keller
VLM
31
1
0
24 May 2023
Exploring Diverse In-Context Configurations for Image Captioning
Xu Yang
Yongliang Wu
Mingzhuo Yang
Haokun Chen
Xin Geng
MLLM
27
51
0
24 May 2023
Alt-Text with Context: Improving Accessibility for Images on Twitter
Nikita Srivatsan
Sofia Samaniego
Omar U. Florez
Taylor Berg-Kirkpatrick
20
3
0
24 May 2023
UniChart: A Universal Vision-language Pretrained Model for Chart Comprehension and Reasoning
Ahmed Masry
P. Kavehzadeh
Do Xuan Long
Enamul Hoque
Chenyu You
LRM
27
100
0
24 May 2023
Copy Recurrent Neural Network Structure Network
Xiaofan Zhou
Xunzhu Tang
19
0
0
22 May 2023
Text-based Person Search without Parallel Image-Text Data
Yang Bai
Wenwen Qiang
Min Cao
Cheng Chen
Ziqiang Cao
Liqiang Nie
Min Zhang
38
13
0
22 May 2023
Album Storytelling with Iterative Story-aware Captioning and Large Language Models
Munan Ning
Yujia Xie
Dongdong Chen
Zeyin Song
Lu Yuan
Yonghong Tian
QiXiang Ye
Liuliang Yuan
33
8
0
22 May 2023
Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner
Zikang Liu
Sihan Chen
Longteng Guo
Handong Li
Xingjian He
Jiaheng Liu
15
1
0
19 May 2023
Generating Visual Spatial Description via Holistic 3D Scene Understanding
Yu Zhao
Hao Fei
Wei Ji
Jianguo Wei
Meishan Zhang
Hao Fei
Tat-Seng Chua
28
33
0
19 May 2023
Multi-task Paired Masking with Alignment Modeling for Medical Vision-Language Pre-training
Kecheng Zhang
Jing Zhang
Jun Yu
Han Jiang
Jianping Fan
Qing-An Huang
Weidong Han
MedIm
38
29
0
13 May 2023
Automatic Radiology Report Generation by Learning with Increasingly Hard Negatives
Bhanu Prakash Voutharoja
Lei Wang
Luping Zhou
MedIm
33
8
0
11 May 2023
Simple Token-Level Confidence Improves Caption Correctness
Suzanne Petryk
Spencer Whitehead
Joseph E. Gonzalez
Trevor Darrell
Anna Rohrbach
Marcus Rohrbach
31
7
0
11 May 2023
Image-to-Text Translation for Interactive Image Recognition: A Comparative User Study with Non-Expert Users
Wataru Kawabe
Yusuke Sugano
VLM
35
2
0
11 May 2023
Image Captioners Sometimes Tell More Than Images They See
Honori Udo
Takafumi Koshinaka
VLM
17
4
0
04 May 2023
Caption Anything: Interactive Image Description with Diverse Multimodal Controls
Teng Wang
Jinrui Zhang
Junjie Fei
Hao Zheng
Yunlong Tang
Zhe Li
Mingqi Gao
Shanshan Zhao
MLLM
104
82
0
04 May 2023
Making the Most of What You Have: Adapting Pre-trained Visual Language Models in the Low-data Regime
Chuhan Zhang
Antoine Miech
Jiajun Shen
Jean-Baptiste Alayrac
Pauline Luc
VLM
VPVLM
47
2
0
03 May 2023
Transforming Visual Scene Graphs to Image Captions
Xu Yang
Jiawei Peng
Zihua Wang
Haiyang Xu
Qinghao Ye
Chenliang Li
Mingshi Yan
Feisi Huang
Zhangzikang Li
Yu Zhang
49
19
0
03 May 2023
Multimodal Data Augmentation for Image Captioning using Diffusion Models
Changrong Xiao
S. Xu
Kunpeng Zhang
DiffM
27
10
0
03 May 2023
Multi-channel Speech Separation Using Spatially Selective Deep Non-linear Filters
Kristina Tesch
Timo Gerkmann
26
16
0
24 Apr 2023
Grounding Classical Task Planners via Vision-Language Models
Xiaohan Zhang
Yan Ding
S. Amiri
Hao Yang
Andy Kaminski
Chad Esselink
Shiqi Zhang
26
17
0
17 Apr 2023
Interactive and Explainable Region-guided Radiology Report Generation
Tim Tanida
Philip Muller
Georgios Kaissis
Daniel Rueckert
MedIm
43
110
0
17 Apr 2023
Previous
1
2
3
...
5
6
7
...
39
40
41
Next