Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1411.5726
Cited By
CIDEr: Consensus-based Image Description Evaluation
20 November 2014
Ramakrishna Vedantam
C. L. Zitnick
Devi Parikh
Re-assign community
ArXiv
PDF
HTML
Papers citing
"CIDEr: Consensus-based Image Description Evaluation"
50 / 2,145 papers shown
Title
Efficient Image Captioning for Edge Devices
Ning Wang
Jiangrong Xie
Hangzai Luo
Qinglin Cheng
Jihao Wu
Mingbo Jia
Linlin Li
VLM
CLIP
30
20
0
18 Dec 2022
Semantics-Empowered Communication: A Tutorial-cum-Survey
Zhilin Lu
Rongpeng Li
Kun Lu
Xianfu Chen
Ekram Hossain
Zhifeng Zhao
Honggang Zhang
57
19
0
16 Dec 2022
Revisiting the Gold Standard: Grounding Summarization Evaluation with Robust Human Evaluation
Yixin Liu
Alexander R. Fabbri
Pengfei Liu
Yilun Zhao
Linyong Nan
...
Simeng Han
Shafiq Joty
Chien-Sheng Wu
Caiming Xiong
Dragomir R. Radev
ALM
39
133
0
15 Dec 2022
NLIP: Noise-robust Language-Image Pre-training
Runhu Huang
Yanxin Long
Jianhua Han
Hang Xu
Xiwen Liang
Chunjing Xu
Xiaodan Liang
VLM
46
30
0
14 Dec 2022
ScanEnts3D: Exploiting Phrase-to-3D-Object Correspondences for Improved Visio-Linguistic Models in 3D Scenes
Ahmed Abdelreheem
Kyle Olszewski
Hsin-Ying Lee
Peter Wonka
Panos Achlioptas
3DPC
35
28
0
12 Dec 2022
Contextual Explainable Video Representation: Human Perception-based Understanding
Khoa T. Vo
Kashu Yamazaki
Phong H. Nguyen
Pha Nguyen
Khoa Luu
Ngan Le
26
9
0
12 Dec 2022
Information-Theoretic Text Hallucination Reduction for Video-grounded Dialogue
Sunjae Yoon
Eunseop Yoon
Hee Suk Yoon
Junyeong Kim
Changdong Yoo
32
18
0
12 Dec 2022
REVEAL: Retrieval-Augmented Visual-Language Pre-Training with Multi-Source Multimodal Knowledge Memory
Ziniu Hu
Ahmet Iscen
Chen Sun
Zirui Wang
Kai-Wei Chang
Yizhou Sun
Cordelia Schmid
David A. Ross
Alireza Fathi
RALM
VLM
59
90
0
10 Dec 2022
VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners
Shen Yan
Tao Zhu
Zirui Wang
Yuan Cao
Mi Zhang
Soham Ghosh
Yonghui Wu
Jiahui Yu
VLM
VGen
39
47
0
09 Dec 2022
Open-world Story Generation with Structured Knowledge Enhancement: A Comprehensive Survey
Yuxin Wang
Jieru Lin
Zhiwei Yu
Wei Hu
Börje F. Karlsson
38
17
0
09 Dec 2022
OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist Models
Jinze Bai
Rui Men
Han Yang
Xuancheng Ren
Kai Dang
...
Wenhang Ge
Jianxin Ma
Junyang Lin
Jingren Zhou
Chang Zhou
37
15
0
08 Dec 2022
Harnessing the Power of Multi-Task Pretraining for Ground-Truth Level Natural Language Explanations
Björn Plüster
Jakob Ambsdorf
Lukas Braach
Jae Hee Lee
S. Wermter
33
6
0
08 Dec 2022
Switching to Discriminative Image Captioning by Relieving a Bottleneck of Reinforcement Learning
Ukyo Honda
Taro Watanabe
Yuji Matsumoto
23
9
0
06 Dec 2022
Semantic-Conditional Diffusion Networks for Image Captioning
Jianjie Luo
Yehao Li
Yingwei Pan
Ting Yao
Jianlin Feng
Hongyang Chao
Tao Mei
DiffM
32
64
0
06 Dec 2022
Towards Generating Diverse Audio Captions via Adversarial Training
Xinhao Mei
Xubo Liu
Jianyuan Sun
Mark D. Plumbley
Wenwu Wang
DiffM
46
2
0
05 Dec 2022
Controllable Image Captioning via Prompting
Ning Wang
Jiahao Xie
Jihao Wu
Mingbo Jia
Linlin Li
35
23
0
04 Dec 2022
Focus! Relevant and Sufficient Context Selection for News Image Captioning
Mingyang Zhou
Grace Luo
Anna Rohrbach
Zhou Yu
CLIP
35
13
0
01 Dec 2022
Long-Document Cross-Lingual Summarization
Shaohui Zheng
Zhixu Li
Jiaan Wang
Jianfeng Qu
An Liu
Lei Zhao
Zhigang Chen
RALM
47
9
0
01 Dec 2022
Uncertainty-Aware Image Captioning
Zhengcong Fei
Mingyuan Fan
Li Zhu
Junshi Huang
Xiaoming Wei
Xiaolin K. Wei
UQLM
28
10
0
30 Nov 2022
Task-Aware Asynchronous Multi-Task Model with Class Incremental Contrastive Learning for Surgical Scene Understanding
Lalithkumar Seenivasan
Mobarakol Islam
Mengya Xu
C. Lim
Hongliang Ren
35
3
0
28 Nov 2022
VLTinT: Visual-Linguistic Transformer-in-Transformer for Coherent Video Paragraph Captioning
Kashu Yamazaki
Khoa T. Vo
Sang Truong
Bhiksha Raj
Ngan Le
36
35
0
28 Nov 2022
Refined Semantic Enhancement towards Frequency Diffusion for Video Captioning
Xian Zhong
Zipeng Li
Shuqin Chen
Kui Jiang
Chen Chen
Mang Ye
DiffM
VGen
34
41
0
28 Nov 2022
CLID: Controlled-Length Image Descriptions with Limited Data
Elad Hirsch
A. Tal
VLM
3DV
27
4
0
27 Nov 2022
Aesthetically Relevant Image Captioning
Zhipeng Zhong
Fei Zhou
Guoping Qiu
44
9
0
25 Nov 2022
Language-Assisted 3D Feature Learning for Semantic Scene Understanding
Junbo Zhang
Guo Fan
Guanghan Wang
Zhèngyuān Sū
Kaisheng Ma
L. Yi
3DPC
32
7
0
25 Nov 2022
Retrieval-Augmented Multimodal Language Modeling
Michihiro Yasunaga
Armen Aghajanyan
Weijia Shi
Rich James
J. Leskovec
Percy Liang
M. Lewis
Luke Zettlemoyer
Wen-tau Yih
RALM
27
96
0
22 Nov 2022
Aligning Source Visual and Target Language Domains for Unpaired Video Captioning
Fenglin Liu
Xian Wu
Chenyu You
Shen Ge
Yuexian Zou
Xu Sun
31
23
0
22 Nov 2022
Exploring Discrete Diffusion Models for Image Captioning
Zixin Zhu
Yixuan Wei
Jianfeng Wang
Zhe Gan
Zheng Zhang
Le Wang
G. Hua
Lijuan Wang
Zicheng Liu
Han Hu
DiffM
VLM
36
18
0
21 Nov 2022
Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations
Peng Jin
Jinfa Huang
Fenglin Liu
Xian Wu
Shen Ge
Guoli Song
David Clifton
Jing Chen
VLM
73
64
0
21 Nov 2022
VER: Unifying Verbalizing Entities and Relations
Jie Huang
Kevin Chen-Chuan Chang
47
1
0
20 Nov 2022
A survey on knowledge-enhanced multimodal learning
Maria Lymperaiou
Giorgos Stamou
64
14
0
19 Nov 2022
Impact of visual assistance for automated audio captioning
Wim Boes
Hugo Van hamme
30
1
0
18 Nov 2022
I Can't Believe There's No Images! Learning Visual Tasks Using only Language Supervision
Sophia Gu
Christopher Clark
Aniruddha Kembhavi
VLM
32
24
0
17 Nov 2022
Visual Commonsense-aware Representation Network for Video Captioning
Pengpeng Zeng
Haonan Zhang
Lianli Gao
Xiangpeng Li
Jin Qian
Hengtao Shen
39
16
0
17 Nov 2022
Progressive Tree-Structured Prototype Network for End-to-End Image Captioning
Pengpeng Zeng
Jinkuan Zhu
Jingkuan Song
Lianli Gao
VLM
29
27
0
17 Nov 2022
Feedback is Needed for Retakes: An Explainable Poor Image Notification Framework for the Visually Impaired
Kazuya Ohata
Shunsuke Kitada
Hitoshi Iyatomi
43
0
0
17 Nov 2022
CapEnrich: Enriching Caption Semantics for Web Images via Cross-modal Pre-trained Knowledge
Linli Yao
Wei Chen
Qin Jin
VLM
38
10
0
17 Nov 2022
Lesion Guided Explainable Few Weak-shot Medical Report Generation
Jinghan Sun
Dong Wei
Liansheng Wang
Yefeng Zheng
MedIm
29
13
0
16 Nov 2022
Parameter-Efficient Tuning on Layer Normalization for Pre-trained Language Models
Wang Qi
Yu-Ping Ruan
Y. Zuo
Taihao Li
32
18
0
16 Nov 2022
PromptCap: Prompt-Guided Task-Aware Image Captioning
Yushi Hu
Hang Hua
Zhengyuan Yang
Weijia Shi
Noah A. Smith
Jiebo Luo
68
101
0
15 Nov 2022
kogito: A Commonsense Knowledge Inference Toolkit
Mete Ismayilzada
Antoine Bosselut
35
7
0
15 Nov 2022
Will Large-scale Generative Models Corrupt Future Datasets?
Ryuichiro Hataya
Han Bao
Hiromi Arai
27
55
0
15 Nov 2022
A Unified Mutual Supervision Framework for Referring Expression Segmentation and Generation
Shijia Huang
Feng Li
Hao Zhang
Siyi Liu
Lei Zhang
Liwei Wang
35
5
0
15 Nov 2022
Is my automatic audio captioning system so bad? spider-max: a metric to consider several caption candidates
Etienne Labbé
Thomas Pellegrini
J. Pinquier
22
4
0
14 Nov 2022
Why Did the Chicken Cross the Road? Rephrasing and Analyzing Ambiguous Questions in VQA
Elias Stengel-Eskin
Jimena Guallar-Blasco
Yi Zhou
Benjamin Van Durme
UQLM
40
11
0
14 Nov 2022
Zero-shot Image Captioning by Anchor-augmented Vision-Language Space Alignment
Junyan Wang
Yi Zhang
Ming Yan
Ji Zhang
Jitao Sang
VLM
41
9
0
14 Nov 2022
Large-Scale Bidirectional Training for Zero-Shot Image Captioning
Taehoon Kim
Mark A Marsden
Pyunghwan Ahn
Sangyun Kim
Sihaeng Lee
Alessandra Sala
S. Kim
VLM
45
4
0
13 Nov 2022
DeltaNet:Conditional Medical Report Generation for COVID-19 Diagnosis
Xian Wu
Shuxin Yang
Zhaopeng Qiu
Shen Ge
Yangtian Yan
Xingwang Wu
Yefeng Zheng
S. Kevin Zhou
Li Xiao
MedIm
48
20
0
12 Nov 2022
Investigations in Audio Captioning: Addressing Vocabulary Imbalance and Evaluating Suitability of Language-Centric Performance Metrics
Sandeep Reddy Kothinti
Dimitra Emmanouilidou
14
3
0
12 Nov 2022
VieCap4H-VLSP 2021: ObjectAoA-Enhancing performance of Object Relation Transformer with Attention on Attention for Vietnamese image captioning
Nghia Hieu Nguyen
Duong T.D. Vo
Minh-Quan Ha
ViT
40
1
0
10 Nov 2022
Previous
1
2
3
...
19
20
21
...
41
42
43
Next