Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1411.4555
Cited By
Show and Tell: A Neural Image Caption Generator
17 November 2014
Oriol Vinyals
Alexander Toshev
Samy Bengio
D. Erhan
3DV
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Show and Tell: A Neural Image Caption Generator"
50 / 2,023 papers shown
Title
Writing by Memorizing: Hierarchical Retrieval-based Medical Report Generation
Xingyi Yang
Muchao Ye
Quanzeng You
Fenglong Ma
MedIm
24
38
0
25 May 2021
Automated Knee X-ray Report Generation
Aydan Gasimova
Giovanni Montana
Daniel Rueckert
MedIm
9
1
0
22 May 2021
Dependent Multi-Task Learning with Causal Intervention for Image Captioning
Wenqing Chen
Jidong Tian
Caoyun Fan
Hao He
Yaohui Jin
CML
27
6
0
18 May 2021
Survey of Visual-Semantic Embedding Methods for Zero-Shot Image Retrieval
K. Ueki
23
3
0
16 May 2021
Empirical Analysis of Image Caption Generation using Deep Learning
Aditya R. Bhattacharya
Eshwar Shamanna Girishekar
Padmakar Anil Deshpande
21
1
0
14 May 2021
Connecting What to Say With Where to Look by Modeling Human Attention Traces
Zihang Meng
Licheng Yu
Ning Zhang
Tamara L. Berg
Babak Damavandi
Vikas Singh
Amy Bearman
40
25
0
12 May 2021
Instance-aware Remote Sensing Image Captioning with Cross-hierarchy Attention
Chengze Wang
Zhiyu Jiang
Yuan Yuan
16
11
0
11 May 2021
Matching Visual Features to Hierarchical Semantic Topics for Image Paragraph Captioning
D. Guo
Ruiying Lu
Bo Chen
Zequn Zeng
Mingyuan Zhou
VLM
27
9
0
10 May 2021
A Hybrid Model for Combining Neural Image Caption and k-Nearest Neighbor Approach for Image Captioning
Kartik Arora
Ajul Raj
Arun Goel
Seba Susan
11
0
0
09 May 2021
Exploring Explicit and Implicit Visual Relationships for Image Captioning
Zeliang Song
Xiaofei Zhou
21
7
0
06 May 2021
LFI-CAM: Learning Feature Importance for Better Visual Explanation
Kwang Hee Lee
Chaewon Park
J. Oh
Nojun Kwak
FAtt
37
27
0
03 May 2021
Maneuver-Aware Pooling for Vehicle Trajectory Prediction
Mohamed Hasan
Albert Solernou
Evangelos Paschalidis
He Wang
Gustav Markkula
R. Romano
24
16
0
29 Apr 2021
Removing Word-Level Spurious Alignment between Images and Pseudo-Captions in Unsupervised Image Captioning
Ukyo Honda
Yoshitaka Ushiku
Atsushi Hashimoto
Taro Watanabe
Yuji Matsumoto
33
23
0
28 Apr 2021
Contextualized Keyword Representations for Multi-modal Retinal Image Captioning
Jia-Hong Huang
Ting-Wei Wu
M. Worring
MedIm
68
26
0
26 Apr 2021
MusCaps: Generating Captions for Music Audio
Ilaria Manco
Emmanouil Benetos
Elio Quinton
Gyorgy Fazekas
35
36
0
24 Apr 2021
Weakly-supervised Multi-task Learning for Multimodal Affect Recognition
Wenliang Dai
Samuel Cahyawijaya
Yejin Bang
Pascale Fung
CVBM
41
11
0
23 Apr 2021
Towards Accurate Text-based Image Captioning with Content Diversity Exploration
Guanghui Xu
Shuaicheng Niu
Mingkui Tan
Yucheng Luo
Qing Du
Qi Wu
DiffM
27
56
0
23 Apr 2021
Maneuver-based Anchor Trajectory Hypotheses at Roundabouts
Mohamed Hasan
Evangelos Paschalidis
Albert Solernou
He Wang
Gustav Markkula
R. Romano
LLMSV
39
3
0
22 Apr 2021
Discrete-continuous Action Space Policy Gradient-based Attention for Image-Text Matching
Shiyang Yan
Li Yu
Yuan Xie
47
34
0
21 Apr 2021
Robust Open-Vocabulary Translation from Visual Text Representations
Elizabeth Salesky
David Etter
Matt Post
VLM
27
39
0
16 Apr 2021
Compressing Visual-linguistic Model via Knowledge Distillation
Zhiyuan Fang
Jianfeng Wang
Xiaowei Hu
Lijuan Wang
Yezhou Yang
Zicheng Liu
VLM
39
97
0
05 Apr 2021
Towards General Purpose Vision Systems
Tanmay Gupta
Amita Kamath
Aniruddha Kembhavi
Derek Hoiem
13
50
0
01 Apr 2021
Attention, please! A survey of Neural Attention Models in Deep Learning
Alana de Santana Correia
Esther Luna Colombini
HAI
28
175
0
31 Mar 2021
A Comprehensive Review of the Video-to-Text Problem
Jesus Perez-Martin
B. Bustos
S. Guimarães
I. Sipiran
Jorge A. Pérez
Grethel Coello Said
13
17
0
27 Mar 2021
On the hidden treasure of dialog in video question answering
Deniz Engin
Franccois Schnitzler
Ngoc Q. K. Duong
Yannis Avrithis
29
10
0
26 Mar 2021
Describing and Localizing Multiple Changes with Transformers
Yue Qiu
Shintaro Yamamoto
Kodai Nakashima
Ryota Suzuki
K. Iwata
Hirokatsu Kataoka
Y. Satoh
30
55
0
25 Mar 2021
Shared Latent Space of Font Shapes and Their Noisy Impressions
Jihun Kang
Daichi Haraguchi
Seiya Matsuda
Akisato Kimura
S. Uchida
30
5
0
23 Mar 2021
Human-like Controllable Image Captioning with Verb-specific Semantic Roles
Long Chen
Zhihong Jiang
Jun Xiao
Wei Liu
32
74
0
22 Mar 2021
Alleviate Exposure Bias in Sequence Prediction \\ with Recurrent Neural Networks
Liping Yuan
Jiangtao Feng
Xiaoqing Zheng
Xuanjing Huang
33
1
0
22 Mar 2021
Cross-modal Image Retrieval with Deep Mutual Information Maximization
Chunbin Gu
Jiajun Bu
Xixi Zhou
Chengwei Yao
Dongfang Ma
Zhi Yu
Xifeng Yan
23
16
0
10 Mar 2021
Analysis of Convolutional Decoder for Image Caption Generation
Sulabh Katiyar
S. Borgohain
18
0
0
08 Mar 2021
Relationship-based Neural Baby Talk
Fan Fu
Tingting Xie
Ioannis Patras
Sepehr Jalali
14
0
0
08 Mar 2021
Perspectives and Prospects on Transformer Architecture for Cross-Modal Tasks with Language and Vision
Andrew Shin
Masato Ishii
T. Narihira
35
37
0
06 Mar 2021
Causal Attention for Vision-Language Tasks
Xu Yang
Hanwang Zhang
Guojun Qi
Jianfei Cai
CML
36
149
0
05 Mar 2021
End-to-end acoustic modelling for phone recognition of young readers
Lucile Gelin
Morgane Daniel
J. Pinquier
Thomas Pellegrini
18
13
0
04 Mar 2021
EnD: Entangling and Disentangling deep representations for bias correction
Enzo Tartaglione
C. Barbano
Marco Grangetto
26
123
0
02 Mar 2021
WIT: Wikipedia-based Image Text Dataset for Multimodal Multilingual Machine Learning
Krishna Srinivasan
K. Raman
Jiecao Chen
Michael Bendersky
Marc Najork
VLM
213
310
0
02 Mar 2021
MultiSubs: A Large-scale Multimodal and Multilingual Dataset
Josiah Wang
Pranava Madhyastha
J. Figueiredo
Chiraag Lala
Lucia Specia
VGen
22
11
0
02 Mar 2021
Mitigating Edge Machine Learning Inference Bottlenecks: An Empirical Study on Accelerating Google Edge Models
Amirali Boroumand
Saugata Ghose
Berkin Akin
Ravi Narayanaswami
Geraldo F. Oliveira
Xiaoyu Ma
Eric Shiu
O. Mutlu
27
29
0
01 Mar 2021
Characterization and recognition of handwritten digits using Julia
Md Asifuzzaman Jishan
M. Alam
A. Islam
I. R. Mazumder
K. Mahmud
A. K. Azad
24
0
0
24 Feb 2021
Enhanced Modality Transition for Image Captioning
Ziwei Wang
Yadan Luo
Zi Huang
13
0
0
23 Feb 2021
Progressive Transformer-Based Generation of Radiology Reports
Farhad Nooralahzadeh
Nicolas Andres Perez Gonzalez
T. Frauenfelder
Koji Fujimoto
Michael Krauthammer
ViT
MedIm
23
85
0
19 Feb 2021
Hierarchical Similarity Learning for Language-based Product Image Retrieval
Zhe Ma
Fenghao Liu
Jianfeng Dong
Xiaoye Qu
Yuan He
S. Ji
VLM
29
4
0
18 Feb 2021
Improved Bengali Image Captioning via deep convolutional neural network based encoder-decoder model
Mohammad Faiyaz Khan
S. M. S. Shifath
Md. Saiful Islam
VLM
35
18
0
14 Feb 2021
Image Captioning using Multiple Transformers for Self-Attention Mechanism
Farrukh Olimov
Shikha Dubey
Labina Shrestha
Tran Trung Tin
M. Jeon
ViT
34
2
0
14 Feb 2021
InsNet: An Efficient, Flexible, and Performant Insertion-based Text Generation Model
Sidi Lu
Tao Meng
Nanyun Peng
24
12
0
12 Feb 2021
In Defense of Scene Graphs for Image Captioning
Kien Nguyen
Subarna Tripathi
Bang Du
T. Guha
Truong Thao Nguyen
39
42
0
09 Feb 2021
Iconographic Image Captioning for Artworks
E. Cetinic
32
24
0
07 Feb 2021
RpBERT: A Text-image Relation Propagation-based BERT Model for Multimodal NER
Lin Sun
Jiquan Wang
Kai Zhang
Yindu Su
Fangsheng Weng
22
133
0
05 Feb 2021
L2C: Describing Visual Differences Needs Semantic Understanding of Individuals
An Yan
Xinze Wang
Tsu-Jui Fu
William Yang Wang
VLM
32
11
0
03 Feb 2021
Previous
1
2
3
...
13
14
15
...
39
40
41
Next