ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1411.5726
  4. Cited By
CIDEr: Consensus-based Image Description Evaluation
v1v2 (latest)

CIDEr: Consensus-based Image Description Evaluation

20 November 2014
Ramakrishna Vedantam
C. L. Zitnick
Devi Parikh
ArXiv (abs)PDFHTML

Papers citing "CIDEr: Consensus-based Image Description Evaluation"

50 / 2,184 papers shown
Title
Harnessing the Power of Multi-Task Pretraining for Ground-Truth Level
  Natural Language Explanations
Harnessing the Power of Multi-Task Pretraining for Ground-Truth Level Natural Language Explanations
Björn Plüster
Jakob Ambsdorf
Lukas Braach
Jae Hee Lee
S. Wermter
78
6
0
08 Dec 2022
Switching to Discriminative Image Captioning by Relieving a Bottleneck
  of Reinforcement Learning
Switching to Discriminative Image Captioning by Relieving a Bottleneck of Reinforcement Learning
Ukyo Honda
Taro Watanabe
Yuji Matsumoto
63
9
0
06 Dec 2022
Semantic-Conditional Diffusion Networks for Image Captioning
Semantic-Conditional Diffusion Networks for Image Captioning
Jianjie Luo
Yehao Li
Yingwei Pan
Ting Yao
Jianlin Feng
Hongyang Chao
Tao Mei
DiffM
91
74
0
06 Dec 2022
Towards Generating Diverse Audio Captions via Adversarial Training
Towards Generating Diverse Audio Captions via Adversarial Training
Xinhao Mei
Xubo Liu
Jianyuan Sun
Mark D. Plumbley
Wenwu Wang
DiffM
88
2
0
05 Dec 2022
Controllable Image Captioning via Prompting
Controllable Image Captioning via Prompting
Ning Wang
Jiahao Xie
Jihao Wu
Mingbo Jia
Linlin Li
64
24
0
04 Dec 2022
Focus! Relevant and Sufficient Context Selection for News Image
  Captioning
Focus! Relevant and Sufficient Context Selection for News Image Captioning
Mingyang Zhou
Grace Luo
Anna Rohrbach
Zhou Yu
CLIP
75
13
0
01 Dec 2022
Long-Document Cross-Lingual Summarization
Long-Document Cross-Lingual Summarization
Shaohui Zheng
Zhixu Li
Jiaan Wang
Jianfeng Qu
An Liu
Lei Zhao
Zhigang Chen
RALM
113
9
0
01 Dec 2022
Uncertainty-Aware Image Captioning
Uncertainty-Aware Image Captioning
Zhengcong Fei
Mingyuan Fan
Li Zhu
Junshi Huang
Xiaoming Wei
Xiaolin K. Wei
UQLM
69
13
0
30 Nov 2022
Task-Aware Asynchronous Multi-Task Model with Class Incremental
  Contrastive Learning for Surgical Scene Understanding
Task-Aware Asynchronous Multi-Task Model with Class Incremental Contrastive Learning for Surgical Scene Understanding
Lalithkumar Seenivasan
Mobarakol Islam
Mengya Xu
C. Lim
Hongliang Ren
62
4
0
28 Nov 2022
VLTinT: Visual-Linguistic Transformer-in-Transformer for Coherent Video
  Paragraph Captioning
VLTinT: Visual-Linguistic Transformer-in-Transformer for Coherent Video Paragraph Captioning
Kashu Yamazaki
Khoa T. Vo
Sang Truong
Bhiksha Raj
Ngan Le
80
38
0
28 Nov 2022
Refined Semantic Enhancement towards Frequency Diffusion for Video
  Captioning
Refined Semantic Enhancement towards Frequency Diffusion for Video Captioning
Xian Zhong
Zipeng Li
Shuqin Chen
Kui Jiang
Chen Chen
Mang Ye
DiffMVGen
105
43
0
28 Nov 2022
CLID: Controlled-Length Image Descriptions with Limited Data
CLID: Controlled-Length Image Descriptions with Limited Data
Elad Hirsch
A. Tal
VLM3DV
60
4
0
27 Nov 2022
Aesthetically Relevant Image Captioning
Aesthetically Relevant Image Captioning
Zhipeng Zhong
Fei Zhou
Guoping Qiu
64
9
0
25 Nov 2022
Language-Assisted 3D Feature Learning for Semantic Scene Understanding
Language-Assisted 3D Feature Learning for Semantic Scene Understanding
Junbo Zhang
Guo Fan
Guanghan Wang
Zhèngyuān Sū
Kaisheng Ma
L. Yi
3DPC
78
7
0
25 Nov 2022
Retrieval-Augmented Multimodal Language Modeling
Retrieval-Augmented Multimodal Language Modeling
Michihiro Yasunaga
Armen Aghajanyan
Weijia Shi
Rich James
J. Leskovec
Percy Liang
M. Lewis
Luke Zettlemoyer
Wen-tau Yih
RALM
106
108
0
22 Nov 2022
Aligning Source Visual and Target Language Domains for Unpaired Video
  Captioning
Aligning Source Visual and Target Language Domains for Unpaired Video Captioning
Fenglin Liu
Xian Wu
Chenyu You
Shen Ge
Yuexian Zou
Xu Sun
95
25
0
22 Nov 2022
Exploring Discrete Diffusion Models for Image Captioning
Exploring Discrete Diffusion Models for Image Captioning
Zixin Zhu
Yixuan Wei
Jianfeng Wang
Zhe Gan
Zheng Zhang
Le Wang
G. Hua
Lijuan Wang
Zicheng Liu
Han Hu
DiffMVLM
100
24
0
21 Nov 2022
Expectation-Maximization Contrastive Learning for Compact
  Video-and-Language Representations
Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations
Peng Jin
Jinfa Huang
Fenglin Liu
Xian Wu
Shen Ge
Guoli Song
David Clifton
Jing Chen
VLM
109
69
0
21 Nov 2022
VER: Unifying Verbalizing Entities and Relations
VER: Unifying Verbalizing Entities and Relations
Jie Huang
Kevin Chen-Chuan Chang
114
1
0
20 Nov 2022
A survey on knowledge-enhanced multimodal learning
A survey on knowledge-enhanced multimodal learning
Maria Lymperaiou
Giorgos Stamou
174
15
0
19 Nov 2022
Impact of visual assistance for automated audio captioning
Impact of visual assistance for automated audio captioning
Wim Boes
Hugo Van hamme
65
1
0
18 Nov 2022
I Can't Believe There's No Images! Learning Visual Tasks Using only
  Language Supervision
I Can't Believe There's No Images! Learning Visual Tasks Using only Language Supervision
Sophia Gu
Christopher Clark
Aniruddha Kembhavi
VLM
72
26
0
17 Nov 2022
Visual Commonsense-aware Representation Network for Video Captioning
Visual Commonsense-aware Representation Network for Video Captioning
Pengpeng Zeng
Haonan Zhang
Lianli Gao
Xiangpeng Li
Jin Qian
Hengtao Shen
85
18
0
17 Nov 2022
Progressive Tree-Structured Prototype Network for End-to-End Image
  Captioning
Progressive Tree-Structured Prototype Network for End-to-End Image Captioning
Pengpeng Zeng
Jinkuan Zhu
Jingkuan Song
Lianli Gao
VLM
63
30
0
17 Nov 2022
Feedback is Needed for Retakes: An Explainable Poor Image Notification
  Framework for the Visually Impaired
Feedback is Needed for Retakes: An Explainable Poor Image Notification Framework for the Visually Impaired
Kazuya Ohata
Shunsuke Kitada
Hitoshi Iyatomi
65
0
0
17 Nov 2022
CapEnrich: Enriching Caption Semantics for Web Images via Cross-modal
  Pre-trained Knowledge
CapEnrich: Enriching Caption Semantics for Web Images via Cross-modal Pre-trained Knowledge
Linli Yao
Wei Chen
Qin Jin
VLM
121
11
0
17 Nov 2022
Lesion Guided Explainable Few Weak-shot Medical Report Generation
Lesion Guided Explainable Few Weak-shot Medical Report Generation
Jinghan Sun
Dong Wei
Liansheng Wang
Yefeng Zheng
MedIm
122
13
0
16 Nov 2022
Parameter-Efficient Tuning on Layer Normalization for Pre-trained
  Language Models
Parameter-Efficient Tuning on Layer Normalization for Pre-trained Language Models
Wang Qi
Yu-Ping Ruan
Y. Zuo
Taihao Li
80
19
0
16 Nov 2022
PromptCap: Prompt-Guided Task-Aware Image Captioning
PromptCap: Prompt-Guided Task-Aware Image Captioning
Yushi Hu
Hang Hua
Zhengyuan Yang
Weijia Shi
Noah A. Smith
Jiebo Luo
123
106
0
15 Nov 2022
kogito: A Commonsense Knowledge Inference Toolkit
kogito: A Commonsense Knowledge Inference Toolkit
Mete Ismayilzada
Antoine Bosselut
71
7
0
15 Nov 2022
Will Large-scale Generative Models Corrupt Future Datasets?
Will Large-scale Generative Models Corrupt Future Datasets?
Ryuichiro Hataya
Han Bao
Hiromi Arai
63
58
0
15 Nov 2022
A Unified Mutual Supervision Framework for Referring Expression
  Segmentation and Generation
A Unified Mutual Supervision Framework for Referring Expression Segmentation and Generation
Shijia Huang
Feng Li
Hao Zhang
Siyi Liu
Lei Zhang
Liwei Wang
68
5
0
15 Nov 2022
Is my automatic audio captioning system so bad? spider-max: a metric to
  consider several caption candidates
Is my automatic audio captioning system so bad? spider-max: a metric to consider several caption candidates
Etienne Labbé
Thomas Pellegrini
J. Pinquier
43
4
0
14 Nov 2022
Why Did the Chicken Cross the Road? Rephrasing and Analyzing Ambiguous
  Questions in VQA
Why Did the Chicken Cross the Road? Rephrasing and Analyzing Ambiguous Questions in VQA
Elias Stengel-Eskin
Jimena Guallar-Blasco
Yi Zhou
Benjamin Van Durme
UQLM
72
12
0
14 Nov 2022
Zero-shot Image Captioning by Anchor-augmented Vision-Language Space
  Alignment
Zero-shot Image Captioning by Anchor-augmented Vision-Language Space Alignment
Junyan Wang
Yi Zhang
Ming Yan
Ji Zhang
Jitao Sang
VLM
64
9
0
14 Nov 2022
Large-Scale Bidirectional Training for Zero-Shot Image Captioning
Large-Scale Bidirectional Training for Zero-Shot Image Captioning
Taehoon Kim
Mark A Marsden
Pyunghwan Ahn
Sangyun Kim
Sihaeng Lee
Alessandra Sala
S. Kim
VLM
66
4
0
13 Nov 2022
DeltaNet:Conditional Medical Report Generation for COVID-19 Diagnosis
DeltaNet:Conditional Medical Report Generation for COVID-19 Diagnosis
Xian Wu
Shuxin Yang
Zhaopeng Qiu
Shen Ge
Yangtian Yan
Xingwang Wu
Yefeng Zheng
S. Kevin Zhou
Li Xiao
MedIm
81
21
0
12 Nov 2022
Investigations in Audio Captioning: Addressing Vocabulary Imbalance and
  Evaluating Suitability of Language-Centric Performance Metrics
Investigations in Audio Captioning: Addressing Vocabulary Imbalance and Evaluating Suitability of Language-Centric Performance Metrics
Sandeep Reddy Kothinti
Dimitra Emmanouilidou
50
3
0
12 Nov 2022
VieCap4H-VLSP 2021: ObjectAoA-Enhancing performance of Object Relation
  Transformer with Attention on Attention for Vietnamese image captioning
VieCap4H-VLSP 2021: ObjectAoA-Enhancing performance of Object Relation Transformer with Attention on Attention for Vietnamese image captioning
Nghia Hieu Nguyen
Duong T.D. Vo
Minh-Quan Ha
ViT
50
1
0
10 Nov 2022
OSIC: A New One-Stage Image Captioner Coined
OSIC: A New One-Stage Image Captioner Coined
Bo Wang
Zhao Zhang
Ming Zhao
Xiaojie Jin
Mingliang Xu
Meng Wang
VLM
87
4
0
04 Nov 2022
Video Event Extraction via Tracking Visual States of Arguments
Video Event Extraction via Tracking Visual States of Arguments
Guang Yang
Manling Li
Jiajie Zhang
Xudong Lin
Shih-Fu Chang
Heng Ji
68
12
0
03 Nov 2022
CAMANet: Class Activation Map Guided Attention Network for Radiology
  Report Generation
CAMANet: Class Activation Map Guided Attention Network for Radiology Report Generation
Jun Wang
A. Bhalerao
Terry Yin
Simon See
Yulan He
MedIm
84
18
0
02 Nov 2022
Text-Only Training for Image Captioning using Noise-Injected CLIP
Text-Only Training for Image Captioning using Noise-Injected CLIP
David Nukrai
Ron Mokady
Amir Globerson
VLMCLIP
140
98
0
01 Nov 2022
E2E Refined Dataset
E2E Refined Dataset
K. Toyama
Katsuhito Sudoh
Satoshi Nakamura
57
1
0
01 Nov 2022
Exploring Train and Test-Time Augmentations for Audio-Language Learning
Exploring Train and Test-Time Augmentations for Audio-Language Learning
Eungbeom Kim
Jinhee Kim
Yoori Oh
Kyungsu Kim
Minju Park
Jaeheon Sim
J. Lee
Kyogu Lee
46
12
0
31 Oct 2022
DiMBERT: Learning Vision-Language Grounded Representations with
  Disentangled Multimodal-Attention
DiMBERT: Learning Vision-Language Grounded Representations with Disentangled Multimodal-Attention
Fenglin Liu
Xian Wu
Shen Ge
Xuancheng Ren
Wei Fan
Xu Sun
Yuexian Zou
VLM
108
13
0
28 Oct 2022
FaD-VLP: Fashion Vision-and-Language Pre-training towards Unified
  Retrieval and Captioning
FaD-VLP: Fashion Vision-and-Language Pre-training towards Unified Retrieval and Captioning
Suvir Mirchandani
Licheng Yu
Mengjiao MJ Wang
Animesh Sinha
Wen-Jun Jiang
Tao Xiang
Ning Zhang
81
16
0
26 Oct 2022
Visual Semantic Parsing: From Images to Abstract Meaning Representation
Visual Semantic Parsing: From Images to Abstract Meaning Representation
M. A. Abdelsalam
Zhan Shi
Federico Fancellu
Kalliopi Basioti
Dhaivat Bhatt
Vladimir Pavlovic
Afsaneh Fazly
GNN
89
4
0
26 Oct 2022
End-to-End Multimodal Representation Learning for Video Dialog
End-to-End Multimodal Representation Learning for Video Dialog
Huda AlAmri
Anthony Bilic
Michael Hu
Apoorva Beedu
Irfan Essa
87
7
0
26 Oct 2022
Towards Unifying Reference Expression Generation and Comprehension
Towards Unifying Reference Expression Generation and Comprehension
Duo Zheng
Tao Kong
Ya Jing
Jiaan Wang
Xiaojie Wang
ObjD
57
6
0
24 Oct 2022
Previous
123...202122...424344
Next