Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1411.5726
Cited By
v1
v2 (latest)
CIDEr: Consensus-based Image Description Evaluation
20 November 2014
Ramakrishna Vedantam
C. L. Zitnick
Devi Parikh
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"CIDEr: Consensus-based Image Description Evaluation"
50 / 2,184 papers shown
Title
Rethinking Surgical Captioning: End-to-End Window-Based MLP Transformer Using Patches
Mengya Xu
Mobarakol Islam
Hongliang Ren
MedIm
77
12
0
30 Jun 2022
ZoDIAC: Zoneout Dropout Injection Attention Calculation
Zanyar Zohourianshahzadi
Jugal Kalita
96
0
0
28 Jun 2022
VLCap: Vision-Language with Contrastive Learning for Coherent Video Paragraph Captioning
Kashu Yamazaki
Sang Truong
Khoa T. Vo
Michael Kidd
Chase Rainwater
Khoa Luu
Ngan Le
VLM
CoGe
65
26
0
26 Jun 2022
MVP: Multi-task Supervised Pre-training for Natural Language Generation
Tianyi Tang
Junyi Li
Wayne Xin Zhao
Ji-Rong Wen
120
24
0
24 Jun 2022
Surgical-VQA: Visual Question Answering in Surgical Scenes using Transformer
Lalithkumar Seenivasan
Mobarakol Islam
Adithya K. Krishna
Hongliang Ren
MedIm
79
48
0
22 Jun 2022
Scaling Autoregressive Models for Content-Rich Text-to-Image Generation
Jiahui Yu
Yuanzhong Xu
Jing Yu Koh
Thang Luong
Gunjan Baid
...
Zarana Parekh
Xin Li
Han Zhang
Jason Baldridge
Yonghui Wu
EGVM
295
1,134
0
22 Jun 2022
Bypass Network for Semantics Driven Image Paragraph Captioning
Qinjie Zheng
Chaoyue Wang
Dadong Wang
122
1
0
21 Jun 2022
REVECA -- Rich Encoder-decoder framework for Video Event CAptioner
Jaehyuk Heo
YongGi Jeong
Sunwoo Kim
Jaehee Kim
Pilsung Kang
28
0
0
18 Jun 2022
Self-Supervised Learning for Videos: A Survey
Madeline Chantry Schiappa
Yogesh S Rawat
M. Shah
SSL
132
136
0
18 Jun 2022
Image Captioning based on Feature Refinement and Reflective Decoding
G. Alabduljabbar
Hafida Benhidour
Said Kerrache
3DV
31
3
0
16 Jun 2022
Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone
Zi-Yi Dou
Aishwarya Kamath
Zhe Gan
Pengchuan Zhang
Jianfeng Wang
...
Ce Liu
Yann LeCun
Nanyun Peng
Jianfeng Gao
Lijuan Wang
VLM
ObjD
115
130
0
15 Jun 2022
Measuring Representational Harms in Image Captioning
Angelina Wang
Solon Barocas
Kristen Laird
Hanna M. Wallach
120
55
0
14 Jun 2022
Automatic Clipping: Differentially Private Deep Learning Made Easier and Stronger
Zhiqi Bu
Yu Wang
Sheng Zha
George Karypis
139
72
0
14 Jun 2022
Comprehending and Ordering Semantics for Image Captioning
Yehao Li
Yingwei Pan
Ting Yao
Tao Mei
82
92
0
14 Jun 2022
Language Models are General-Purpose Interfaces
Y. Hao
Haoyu Song
Li Dong
Shaohan Huang
Zewen Chi
Wenhui Wang
Shuming Ma
Furu Wei
MLLM
78
102
0
13 Jun 2022
CoSe-Co: Text Conditioned Generative CommonSense Contextualizer
Rachit Bansal
Milan Aggarwal
S. Bhatia
Jivat Neet Kaur
Balaji Krishnamurthy
43
4
0
12 Jun 2022
Bridging the Gap Between Training and Inference of Bayesian Controllable Language Models
Han Liu
Bingning Wang
Ting Yao
Haijin Liang
Jianjin Xu
Xiaolin Hu
BDL
70
1
0
11 Jun 2022
Improving Image Captioning with Control Signal of Sentence Quality
Zhangzi Zhu
Hong Qu
83
0
0
07 Jun 2022
Intra-agent speech permits zero-shot task acquisition
Chen Yan
Federico Carnevale
Petko Georgiev
Adam Santoro
Aurelia Guy
Alistair Muldal
Chia-Chun Hung
Josh Abramson
Timothy Lillicrap
Greg Wayne
LM&Ro
97
9
0
07 Jun 2022
Cross-modal Clinical Graph Transformer for Ophthalmic Report Generation
Mingjie Li
Wenjia Cai
Karin Verspoor
Shirui Pan
Xiaodan Liang
Xiaojun Chang
MedIm
88
38
0
04 Jun 2022
Automated Audio Captioning with Epochal Difficult Captions for Curriculum Learning
Andrew Koh
Soham Dinesh Tiwari
Chng Eng Siong
53
1
0
04 Jun 2022
Visual Clues: Bridging Vision and Language Foundations for Image Paragraph Captioning
Yujia Xie
Luowei Zhou
Xiyang Dai
Lu Yuan
Nguyen Bach
Ce Liu
Michael Zeng
VLM
MLLM
71
28
0
03 Jun 2022
On Reinforcement Learning and Distribution Matching for Fine-Tuning Language Models with no Catastrophic Forgetting
Tomasz Korbak
Hady ElSahar
Germán Kruszewski
Marc Dymetman
CLL
105
57
0
01 Jun 2022
CLIP4IDC: CLIP for Image Difference Captioning
Zixin Guo
Tong Wang
Jorma T. Laaksonen
VLM
76
30
0
01 Jun 2022
HierarchyNet: Learning to Summarize Source Code with Heterogeneous Representations
Minh Huynh Nguyen
Nghi D. Q. Bui
Truong-Son Hy
Long Tran-Thanh
Tien N. Nguyen
69
6
0
31 May 2022
VLUE: A Multi-Task Benchmark for Evaluating Vision-Language Models
Wangchunshu Zhou
Yan Zeng
Shizhe Diao
Xinsong Zhang
CoGe
VLM
101
13
0
30 May 2022
BAN-Cap: A Multi-Purpose English-Bangla Image Descriptions Dataset
Mohammad Faiyaz Khan
S. M. S. Shifath
Md. Saiful Islam
51
6
0
28 May 2022
GIT: A Generative Image-to-text Transformer for Vision and Language
Jianfeng Wang
Zhengyuan Yang
Xiaowei Hu
Linjie Li
Kevin Qinghong Lin
Zhe Gan
Zicheng Liu
Ce Liu
Lijuan Wang
VLM
178
563
0
27 May 2022
Revisiting Generative Commonsense Reasoning: A Pre-Ordering Approach
Chao Zhao
Faeze Brahman
Tenghao Huang
Snigdha Chaturvedi
LRM
74
5
0
26 May 2022
Prompt-based Learning for Unpaired Image Captioning
Peipei Zhu
Tianlin Li
Lin Zhu
Zhenglong Sun
Weishi Zheng
Yaowei Wang
Chen Chen
VLM
97
33
0
26 May 2022
Fine-grained Image Captioning with CLIP Reward
Jaemin Cho
Seunghyun Yoon
Ajinkya Kale
Franck Dernoncourt
Trung Bui
Joey Tianyi Zhou
CLIP
234
79
0
26 May 2022
InstructDial: Improving Zero and Few-shot Generalization in Dialogue through Instruction Tuning
Prakhar Gupta
Cathy Jiao
Yi-Ting Yeh
Shikib Mehri
M. Eskénazi
Jeffrey P. Bigham
ALM
119
48
0
25 May 2022
Multimodal Knowledge Alignment with Reinforcement Learning
Youngjae Yu
Jiwan Chung
Heeseung Yun
Jack Hessel
Jinho Park
...
Prithviraj Ammanabrolu
Rowan Zellers
Ronan Le Bras
Gunhee Kim
Yejin Choi
VLM
160
37
0
25 May 2022
Mutual Information Divergence: A Unified Metric for Multimodal Generative Models
Jin-Hwa Kim
Yunji Kim
Jiyoung Lee
Kang Min Yoo
Sang-Woo Lee
EGVM
108
35
0
25 May 2022
Crossmodal-3600: A Massively Multilingual Multimodal Evaluation Dataset
Ashish V. Thapliyal
Jordi Pont-Tuset
Xi Chen
Radu Soricut
VGen
175
78
0
25 May 2022
TempLM: Distilling Language Models into Template-Based Generators
Tianyi Zhang
Mina Lee
Lisa Li
Ende Shen
Tatsunori B. Hashimoto
VLM
95
5
0
23 May 2022
Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
Zhenhailong Wang
Manling Li
Ruochen Xu
Luowei Zhou
Jie Lei
...
Chenguang Zhu
Derek Hoiem
Shih-Fu Chang
Joey Tianyi Zhou
Heng Ji
MLLM
VLM
225
142
0
22 May 2022
GL-RG: Global-Local Representation Granularity for Video Captioning
Liqi Yan
Qifan Wang
Yiming Cui
Fuli Feng
Xiaojun Quan
Xinming Zhang
Dongfang Liu
119
59
0
22 May 2022
Context Matters for Image Descriptions for Accessibility: Challenges for Referenceless Evaluation Metrics
Elisa Kreiss
Cynthia L. Bennett
Shayan Hooshmand
E. Zelikman
Meredith Ringel Morris
Christopher Potts
83
27
0
21 May 2022
What's in a Caption? Dataset-Specific Linguistic Diversity and Its Effect on Visual Description Models and Metrics
David M. Chan
Austin Myers
Sudheendra Vijayanarasimhan
David A. Ross
Bryan Seybold
John F. Canny
73
6
0
12 May 2022
Automated Audio Captioning: An Overview of Recent Progress and New Challenges
Xinhao Mei
Xubo Liu
Mark D. Plumbley
Wenwu Wang
113
44
0
12 May 2022
Explainable Deep Learning Methods in Medical Image Classification: A Survey
Cristiano Patrício
João C. Neves
Luís F. Teixeira
XAI
96
59
0
10 May 2022
RoViST:Learning Robust Metrics for Visual Storytelling
Eileen Wang
S. Han
Josiah Poon
49
10
0
08 May 2022
Attract me to Buy: Advertisement Copywriting Generation with Multimodal Multi-structured Information
Zhipeng Zhang
Xinglin Hou
K. Niu
Zhongzhen Huang
T. Ge
Yuning Jiang
Qi Wu
Peifeng Wang
67
5
0
07 May 2022
Language Models Can See: Plugging Visual Controls in Text Generation
Yixuan Su
Tian Lan
Yahui Liu
Fangyu Liu
Dani Yogatama
Yan Wang
Lingpeng Kong
Nigel Collier
VLM
MLLM
113
98
0
05 May 2022
Towards Robust and Semantically Organised Latent Representations for Unsupervised Text Style Transfer
Vivian Lai
Ruijia Cheng
Wenjuan Zhang
74
13
0
04 May 2022
Tragedy Plus Time: Capturing Unintended Human Activities from Weakly-labeled Videos
Arnav Chakravarthy
Zhiyuan Fang
Yezhou Yang
71
2
0
28 Apr 2022
Controllable Image Captioning
Luka Maxwell
110
0
0
28 Apr 2022
CapOnImage: Context-driven Dense-Captioning on Image
Yiqi Gao
Xinglin Hou
Yuanmeng Zhang
T. Ge
Yuning Jiang
Peifeng Wang
139
11
0
27 Apr 2022
SceneTrilogy: On Human Scene-Sketch and its Complementarity with Photo and Text
Pinaki Nath Chowdhury
A. Bhunia
Aneeshan Sain
Subhadeep Koley
Tao Xiang
Yi-Zhe Song
100
30
0
25 Apr 2022
Previous
1
2
3
...
23
24
25
...
42
43
44
Next