Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1607.08822
Cited By
SPICE: Semantic Propositional Image Caption Evaluation
29 July 2016
Peter Anderson
Basura Fernando
Mark Johnson
Stephen Gould
EGVM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"SPICE: Semantic Propositional Image Caption Evaluation"
50 / 949 papers shown
Title
NOC-REK: Novel Object Captioning with Retrieved Vocabulary from External Knowledge
D. Vo
Hong Chen
Akihiro Sugimoto
Hideki Nakayama
124
14
0
28 Mar 2022
Affective Feedback Synthesis Towards Multimodal Text and Image Data
Puneet Kumar
Gaurav Bhatt
Omkar Ingle
Daksh Goyal
Balasubramanian Raman
EGVM
61
4
0
23 Mar 2022
Global2Local: A Joint-Hierarchical Attention for Video Captioning
Chengpeng Dai
Fuhai Chen
Xiaoshuai Sun
Rongrong Ji
QiXiang Ye
Yongjian Wu
79
1
0
13 Mar 2022
Taking an Emotional Look at Video Paragraph Captioning
Qinyu Li
Tengpeng Li
Hanli Wang
Changan Chen
39
5
0
12 Mar 2022
REX: Reasoning-aware and Grounded Explanation
Shi Chen
Qi Zhao
89
18
0
11 Mar 2022
NLX-GPT: A Model for Natural Language Explanations in Vision and Vision-Language Tasks
Fawaz Sammani
Tanmoy Mukherjee
Nikos Deligiannis
MILM
ELM
LRM
130
68
0
09 Mar 2022
Geodesic Multi-Modal Mixup for Robust Fine-Tuning
Changdae Oh
Junhyuk So
Hoyoon Byun
Yongtaek Lim
Minchul Shin
Jong-June Jeon
Kyungwoo Song
139
30
0
08 Mar 2022
Unpaired Image Captioning by Image-level Weakly-Supervised Visual Concept Recognition
Peipei Zhu
Tianlin Li
Yong Luo
Zhenglong Sun
Wei-Shi Zheng
Yaowei Wang
Chen Chen
102
12
0
07 Mar 2022
Leveraging Pre-trained BERT for Audio Captioning
Xubo Liu
Xinhao Mei
Qiushi Huang
Jianyuan Sun
Jinzheng Zhao
Haohe Liu
Mark D. Plumbley
Volkan Kilicc
Wenwu Wang
115
30
0
06 Mar 2022
FS-COCO: Towards Understanding of Freehand Sketches of Common Objects in Context
Pinaki Nath Chowdhury
Aneeshan Sain
A. Bhunia
Tao Xiang
Yulia Gryaditskaya
Yi-Zhe Song
3DV
89
54
0
04 Mar 2022
Vision-Language Intelligence: Tasks, Representation Learning, and Large Models
Feng Li
Hao Zhang
Yi-Fan Zhang
Shixuan Liu
Jian Guo
L. Ni
Pengchuan Zhang
Lei Zhang
AI4TS
VLM
74
37
0
03 Mar 2022
CaMEL: Mean Teacher Learning for Image Captioning
Manuele Barraco
Matteo Stefanini
Marcella Cornia
S. Cascianelli
Lorenzo Baraldi
Rita Cucchiara
ViT
VLM
78
30
0
21 Feb 2022
I-Tuning: Tuning Frozen Language Models with Image for Lightweight Image Captioning
Ziyang Luo
Zhipeng Hu
Yadong Xi
Rongsheng Zhang
Jing Ma
VLM
52
14
0
14 Feb 2022
DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generation Models
Jaemin Cho
Abhaysinh Zala
Joey Tianyi Zhou
ViT
241
193
0
08 Feb 2022
OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
Peng Wang
An Yang
Rui Men
Junyang Lin
Shuai Bai
Zhikang Li
Jianxin Ma
Chang Zhou
Jingren Zhou
Hongxia Yang
MLLM
ObjD
167
883
0
07 Feb 2022
Joint Speech Recognition and Audio Captioning
Chaitanya Narisetty
E. Tsunoo
Xuankai Chang
Yosuke Kashiwagi
Michael Hentschel
Shinji Watanabe
47
10
0
03 Feb 2022
Deep Learning Approaches on Image Captioning: A Review
Taraneh Ghandi
H. Pourreza
H. Mahyar
VLM
130
101
0
31 Jan 2022
A Frustratingly Simple Approach for End-to-End Image Captioning
Ziyang Luo
Yadong Xi
Rongsheng Zhang
Jing Ma
VLM
MLLM
70
16
0
30 Jan 2022
MVPTR: Multi-Level Semantic Alignment for Vision-Language Pre-Training via Multi-Stage Learning
Zejun Li
Zhihao Fan
Huaixiao Tou
Jingjing Chen
Zhongyu Wei
Xuanjing Huang
78
18
0
29 Jan 2022
BERTHA: Video Captioning Evaluation Via Transfer-Learned Human Assessment
Luis Lebron
Yvette Graham
Kevin McGuinness
K. Kouramas
Noel E. O'Connor
80
3
0
25 Jan 2022
Local Information Assisted Attention-free Decoder for Audio Captioning
Feiyang Xiao
Jian Guan
Haiyan Lan
Qiaoxi Zhu
Wenwu Wang
98
11
0
10 Jan 2022
Compact Bidirectional Transformer for Image Captioning
Yuanen Zhou
Zhenzhen Hu
Daqing Liu
Huixia Ben
Meng Wang
VLM
67
16
0
06 Jan 2022
StyleM: Stylized Metrics for Image Captioning Built with Contrastive N-grams
Chengxi Li
Brent Harrison
103
3
0
04 Jan 2022
LaTr: Layout-Aware Transformer for Scene-Text VQA
Ali Furkan Biten
Ron Litman
Yusheng Xie
Srikar Appalaraju
R. Manmatha
ViT
123
102
0
23 Dec 2021
ScanQA: 3D Question Answering for Spatial Scene Understanding
Daich Azuma
Taiki Miyanishi
Shuhei Kurita
M. Kawanabe
94
208
0
20 Dec 2021
Exploiting Long-Term Dependencies for Generating Dynamic Scene Graphs
Shengyu Feng
Subarna Tripathi
Hesham Mostafa
Marcel Nassar
Somdeb Majumdar
94
26
0
18 Dec 2021
NeuroLogic A*esque Decoding: Constrained Text Generation with Lookahead Heuristics
Ximing Lu
Sean Welleck
Peter West
Liwei Jiang
Jungo Kasai
...
Lianhui Qin
Youngjae Yu
Rowan Zellers
Noah A. Smith
Yejin Choi
70
165
0
16 Dec 2021
KGR^4: Retrieval, Retrospect, Refine and Rethink for Commonsense Generation
Xin Liu
Dayiheng Liu
Baosong Yang
Haibo Zhang
Junwei Ding
Wenqing Yao
Weihua Luo
Haiying Zhang
Jinsong Su
LRM
61
8
0
15 Dec 2021
MAGIC: Multimodal relAtional Graph adversarIal inferenCe for Diverse and Unpaired Text-based Image Captioning
Wenqiao Zhang
Haochen Shi
Jiannan Guo
Shengyu Zhang
Qingpeng Cai
Juncheng Li
Sihui Luo
Yueting Zhuang
DiffM
100
46
0
13 Dec 2021
Contextualized Scene Imagination for Generative Commonsense Reasoning
Peifeng Wang
Jonathan Zamora
Junfeng Liu
Filip Ilievski
Muhao Chen
Xiang Ren
ReLM
LRM
95
16
0
12 Dec 2021
Unified Multimodal Pre-training and Prompt-based Tuning for Vision-Language Understanding and Generation
Tianyi Liu
Zuxuan Wu
Wenhan Xiong
Jingjing Chen
Yu-Gang Jiang
VLM
MLLM
88
10
0
10 Dec 2021
Injecting Semantic Concepts into End-to-End Image Captioning
Zhiyuan Fang
Jianfeng Wang
Xiaowei Hu
Lin Liang
Zhe Gan
Lijuan Wang
Yezhou Yang
Zicheng Liu
ViT
VLM
79
91
0
09 Dec 2021
Bidimensional Leaderboards: Generate and Evaluate Language Hand in Hand
Jungo Kasai
Keisuke Sakaguchi
Ronan Le Bras
Lavinia Dunagan
Jacob Morrison
Alexander R. Fabbri
Yejin Choi
Noah A. Smith
97
40
0
08 Dec 2021
Protecting Intellectual Property of Language Generation APIs with Lexical Watermark
Xuanli He
Xingliang Yuan
Lingjuan Lyu
Fangzhao Wu
Chenguang Wang
WaLM
244
98
0
05 Dec 2021
Consensus Graph Representation Learning for Better Grounded Image Captioning
Wenqiao Zhang
Haochen Shi
Siliang Tang
Jun Xiao
Qiang Yu
Yueting Zhuang
74
56
0
02 Dec 2021
Object-Centric Unsupervised Image Captioning
Zihang Meng
David Yang
Xuefei Cao
Ashish Shah
Ser-Nam Lim
OCL
VLM
78
12
0
02 Dec 2021
Relational Graph Learning for Grounded Video Description Generation
Wenqiao Zhang
Xinze Wang
Siliang Tang
Haizhou Shi
Haochen Shi
Jun Xiao
Yueting Zhuang
Wenjie Wang
62
33
0
02 Dec 2021
Neural Attention for Image Captioning: Review of Outstanding Methods
Zanyar Zohourianshahzadi
Jugal Kalita
VLM
86
47
0
29 Nov 2021
ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic
Yoad Tewel
Yoav Shalev
Idan Schwartz
Lior Wolf
VLM
122
196
0
29 Nov 2021
Less is More: Generating Grounded Navigation Instructions from Landmarks
Su Wang
Ceslee Montgomery
Jordi Orbay
Vighnesh Birodkar
Aleksandra Faust
Izzeddin Gur
Natasha Jaques
Austin Waters
Jason Baldridge
Peter Anderson
135
64
0
25 Nov 2021
Generating More Pertinent Captions by Leveraging Semantics and Style on Multi-Source Datasets
Marcella Cornia
Lorenzo Baraldi
G. Fiameni
Rita Cucchiara
109
12
0
24 Nov 2021
UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language Modeling
Zhengyuan Yang
Zhe Gan
Jianfeng Wang
Xiaowei Hu
Faisal Ahmed
Zicheng Liu
Yumao Lu
Lijuan Wang
138
117
0
23 Nov 2021
RedCaps: web-curated image-text data created by the people, for the people
Karan Desai
Gaurav Kaul
Zubin Aysola
Justin Johnson
135
169
0
22 Nov 2021
L-Verse: Bidirectional Generation Between Image and Text
Taehoon Kim
Gwangmo Song
Sihaeng Lee
Sangyun Kim
Yewon Seo
Soonyoung Lee
S. Kim
Honglak Lee
Kyunghoon Bae
138
26
0
22 Nov 2021
UFO: A UniFied TransfOrmer for Vision-Language Representation Learning
Jianfeng Wang
Xiaowei Hu
Zhe Gan
Zhengyuan Yang
Xiyang Dai
Zicheng Liu
Yumao Lu
Lijuan Wang
ViT
73
57
0
19 Nov 2021
ClipCap: CLIP Prefix for Image Captioning
Ron Mokady
Amir Hertz
Amit H. Bermano
CLIP
VLM
78
682
0
18 Nov 2021
Transparent Human Evaluation for Image Captioning
Jungo Kasai
Keisuke Sakaguchi
Lavinia Dunagan
Jacob Morrison
Ronan Le Bras
Yejin Choi
Noah A. Smith
82
49
0
17 Nov 2021
Negative Sample is Negative in Its Own Way: Tailoring Negative Sentences for Image-Text Retrieval
Zhihao Fan
Zhongyu Wei
Zejun Li
Siyuan Wang
Jianqing Fan
53
7
0
05 Nov 2021
Automatic Knowledge Augmentation for Generative Commonsense Reasoning
Jaehyung Seo
Chanjun Park
Sugyeong Eo
Hyeonseok Moon
Heuiseok Lim
ReLM
LRM
43
3
0
30 Oct 2021
Bangla Image Caption Generation through CNN-Transformer based Encoder-Decoder Network
Yuansan Liu
MD Abdullah Al Nasim
Sourav Saha
Faria Afrin
Raisa Mallik
Sathishkumar Samiappan
ViT
41
14
0
24 Oct 2021
Previous
1
2
3
...
10
11
12
...
17
18
19
Next