Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1411.5726
Cited By
CIDEr: Consensus-based Image Description Evaluation
20 November 2014
Ramakrishna Vedantam
C. L. Zitnick
Devi Parikh
Re-assign community
ArXiv
PDF
HTML
Papers citing
"CIDEr: Consensus-based Image Description Evaluation"
50 / 2,142 papers shown
Title
Thinking Hallucination for Video Captioning
Nasib Ullah
Partha Pratim Mohanta
VLM
38
4
0
28 Sep 2022
Improving Radiology Report Generation Systems by Removing Hallucinated References to Non-existent Priors
Vignav Ramesh
Nathan Chi
Pranav Rajpurkar
MedIm
36
49
0
27 Sep 2022
Word to Sentence Visual Semantic Similarity for Caption Generation: Lessons Learned
Ahmed Sabir
25
0
0
26 Sep 2022
DRAMA: Joint Risk Localization and Captioning in Driving
Srikanth Malla
Chiho Choi
Isht Dwivedi
Joonhyang Choi
Jiachen Li
107
88
0
22 Sep 2022
INFINITY: A Simple Yet Effective Unsupervised Framework for Graph-Text Mutual Conversion
Yi Xu
Luoyi Fu
Zhouhan Lin
Jiexing Qi
Xinbing Wang
48
3
0
22 Sep 2022
Show, Interpret and Tell: Entity-aware Contextualised Image Captioning in Wikipedia
K. Nguyen
Ali Furkan Biten
Andrés Mafla
Lluís Gómez
Dimosthenis Karatzas
38
10
0
21 Sep 2022
Toward 3D Spatial Reasoning for Human-like Text-based Visual Question Answering
Hao Li
Jinfa Huang
Peng Jin
Guoli Song
Qi Wu
Jie Chen
44
21
0
21 Sep 2022
Recipe Generation from Unsegmented Cooking Videos
Taichi Nishimura
Atsushi Hashimoto
Yoshitaka Ushiku
Hirotaka Kameko
Shinsuke Mori
25
3
0
21 Sep 2022
Learning Distinct and Representative Styles for Image Captioning
Qi Chen
Chaorui Deng
Qi Wu
VLM
45
23
0
17 Sep 2022
Belief Revision based Caption Re-ranker with Visual Semantic Information
Ahmed Sabir
Francesc Moreno-Noguer
Pranava Madhyastha
Lluís Padró
BDL
37
2
0
16 Sep 2022
Distribution Aware Metrics for Conditional Natural Language Generation
David M. Chan
Yiming Ni
David A. Ross
Sudheendra Vijayanarasimhan
Austin Myers
John F. Canny
53
4
0
15 Sep 2022
PaLI: A Jointly-Scaled Multilingual Language-Image Model
Xi Chen
Tianlin Li
Soravit Changpinyo
A. Piergiovanni
Piotr Padlewski
...
Andreas Steiner
A. Angelova
Xiaohua Zhai
N. Houlsby
Radu Soricut
MLLM
VLM
37
691
0
14 Sep 2022
Automatic Comment Generation via Multi-Pass Deliberation
Fangwen Mu
Xiao Chen
Lin Shi
Song Wang
Qing Wang
47
12
0
14 Sep 2022
PreSTU: Pre-Training for Scene-Text Understanding
Jihyung Kil
Soravit Changpinyo
Xi Chen
Hexiang Hu
Sebastian Goodman
Wei-Lun Chao
Radu Soricut
VLM
145
29
0
12 Sep 2022
MaXM: Towards Multilingual Visual Question Answering
Soravit Changpinyo
Linting Xue
Michal Yarom
Ashish V. Thapliyal
Idan Szpektor
J. Amelot
Xi Chen
Radu Soricut
36
8
0
12 Sep 2022
Evaluation of Question Answering Systems: Complexity of judging a natural language
Amer Farea
Zhen Yang
Kien Duong
Nadeesha Perera
F. Emmert-Streib
ELM
42
3
0
10 Sep 2022
Bridging Music and Text with Crowdsourced Music Comments: A Sequence-to-Sequence Framework for Thematic Music Comments Generation
Peining Zhang
Junliang Guo
Linli Xu
Mu You
Junming Yin
27
0
0
05 Sep 2022
On Grounded Planning for Embodied Tasks with Language Models
Bill Yuchen Lin
Chengsong Huang
Qian Liu
Wenda Gu
Sam Sommerer
Xiang Ren
LM&Ro
36
39
0
29 Aug 2022
Of Human Criteria and Automatic Metrics: A Benchmark of the Evaluation of Story Generation
Cyril Chhun
Pierre Colombo
Chloé Clavel
Fabian M. Suchanek
60
51
0
24 Aug 2022
Improving Personality Consistency in Conversation by Persona Extending
Yifan Liu
Wei Wei
Jiayi Liu
Xian-Ling Mao
Rui Fang
Dangyang Chen
35
24
0
23 Aug 2022
A Medical Semantic-Assisted Transformer for Radiographic Report Generation
Zhanyu Wang
Mingkang Tang
Lei Wang
Xiu Li
Luping Zhou
ViT
MedIm
29
57
0
22 Aug 2022
Diverse Video Captioning by Adaptive Spatio-temporal Attention
Zohreh Ghaderi
Leonard Salewski
Hendrik P. A. Lensch
18
8
0
19 Aug 2022
An investigation on selecting audio pre-trained models for audio captioning
Peiran Yan
Sheng-Wei Li
26
0
0
12 Aug 2022
A Comprehensive Survey of Natural Language Generation Advances from the Perspective of Digital Deception
Keenan I. Jones
Enes ALTUNCU
V. N. Franqueira
Yi-Chia Wang
Shujun Li
DeLMO
47
3
0
11 Aug 2022
Sports Video Analysis on Large-Scale Data
Dekun Wu
Henghui Zhao
Xingce Bao
Richard P. Wildes
29
13
0
09 Aug 2022
Distinctive Image Captioning via CLIP Guided Group Optimization
Youyuan Zhang
Jiuniu Wang
Hao Wu
Wenjia Xu
VLM
40
8
0
08 Aug 2022
Prompt Tuning for Generative Multimodal Pretrained Models
Han Yang
Junyang Lin
An Yang
Peng Wang
Chang Zhou
Hongxia Yang
VLM
LRM
VPVLM
37
30
0
04 Aug 2022
SMART: Sentences as Basic Units for Text Evaluation
Reinald Kim Amplayo
Peter J. Liu
Yao-Min Zhao
Shashi Narayan
38
21
0
01 Aug 2022
MAFW: A Large-scale, Multi-modal, Compound Affective Database for Dynamic Facial Expression Recognition in the Wild
Y. Liu
Wei Dai
Chuanxu Feng
Wenbin Wang
Guanghao Yin
Jiabei Zeng
Shiguang Shan
CVBM
32
62
0
01 Aug 2022
Uncertainty-based Visual Question Answering: Estimating Semantic Inconsistency between Image and Knowledge Base
Jinyeong Chae
Jihie Kim
27
2
0
27 Jul 2022
Retrieval-Augmented Transformer for Image Captioning
Sara Sarto
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
29
57
0
26 Jul 2022
Is GPT-3 all you need for Visual Question Answering in Cultural Heritage?
P. Bongini
Federico Becattini
A. Bimbo
12
13
0
25 Jul 2022
Chunk-aware Alignment and Lexical Constraint for Visual Entailment with Natural Language Explanations
Qian Yang
Yunxin Li
Baotian Hu
Lin Ma
Yuxin Ding
Min Zhang
47
10
0
23 Jul 2022
Rethinking the Reference-based Distinctive Image Captioning
Yangjun Mao
Long Chen
Zhihong Jiang
Dong Zhang
Zhimeng Zhang
Jian Shao
Jun Xiao
DiffM
35
22
0
22 Jul 2022
Zero-Shot Video Captioning with Evolving Pseudo-Tokens
Yoad Tewel
Yoav Shalev
Roy Nadler
Idan Schwartz
Lior Wolf
37
27
0
22 Jul 2022
Efficient Modeling of Future Context for Image Captioning
Zhengcong Fei
Junshi Huang
Xiaoming Wei
Xiaolin K. Wei
50
14
0
22 Jul 2022
Grounding Visual Representations with Texts for Domain Generalization
Seonwoo Min
Nokyung Park
Siwon Kim
Seunghyun Park
Jinkyu Kim
OOD
19
34
0
21 Jul 2022
Diffsound: Discrete Diffusion Model for Text-to-sound Generation
Dongchao Yang
Jianwei Yu
Helin Wang
Wen Wang
Chao Weng
Yuexian Zou
Dong Yu
DiffM
36
297
0
20 Jul 2022
GRIT: Faster and Better Image captioning Transformer Using Dual Visual Features
Van-Quang Nguyen
Masanori Suganuma
Takayuki Okatani
ViT
41
107
0
20 Jul 2022
Explicit Image Caption Editing
Zhen Wang
Long Chen
Wenbo Ma
G. Han
Yulei Niu
Jian Shao
Jun Xiao
25
12
0
20 Jul 2022
Relational Future Captioning Model for Explaining Likely Collisions in Daily Tasks
Motonari Kambara
K. Sugiura
32
6
0
19 Jul 2022
Unifying Event Detection and Captioning as Sequence Generation via Pre-Training
Qi Zhang
Yuqing Song
Qin Jin
30
24
0
18 Jul 2022
Towards the Human Global Context: Does the Vision-Language Model Really Judge Like a Human Being?
Sangmyeong Woh
Jaemin Lee
Hoki Kim
Jinsuk Lee
21
0
0
18 Jul 2022
Dual-branch Hybrid Learning Network for Unbiased Scene Graph Generation
Chao Zheng
Lianli Gao
Xinyu Lyu
Pengpeng Zeng
Abdulmotaleb El Saddik
Hengtao Shen
37
14
0
16 Jul 2022
LineCap: Line Charts for Data Visualization Captioning Models
Anita Mahinpei
Zona Kostic
Christy Tanner
VLM
34
17
0
15 Jul 2022
A Baseline for Detecting Out-of-Distribution Examples in Image Captioning
Gabi Shalev
Gal-Lev Shalev
Joseph Keshet
OODD
29
7
0
12 Jul 2022
Cross-modal Prototype Driven Network for Radiology Report Generation
Jun Wang
A. Bhalerao
Yulan He
MedIm
93
73
0
11 Jul 2022
Adaptive Fine-Grained Predicates Learning for Scene Graph Generation
Xinyu Lyu
Lianli Gao
Pengpeng Zeng
Hengtao Shen
Jingkuan Song
44
18
0
11 Jul 2022
Predicting Word Learning in Children from the Performance of Computer Vision Systems
Sunayana Rane
Mira L. Nencheva
Zeyu Wang
C. Lew‐Williams
Olga Russakovsky
Thomas Griffiths
21
3
0
07 Jul 2022
Exploring the sequence length bottleneck in the Transformer for Image Captioning
Jiapeng Hu
Roberto Cavicchioli
Alessandro Capotondi
ViT
43
3
0
07 Jul 2022
Previous
1
2
3
...
21
22
23
...
41
42
43
Next