ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1411.5726
  4. Cited By
CIDEr: Consensus-based Image Description Evaluation

CIDEr: Consensus-based Image Description Evaluation

20 November 2014
Ramakrishna Vedantam
C. L. Zitnick
Devi Parikh
ArXivPDFHTML

Papers citing "CIDEr: Consensus-based Image Description Evaluation"

50 / 2,142 papers shown
Title
CLIP Meets Video Captioning: Concept-Aware Representation Learning Does
  Matter
CLIP Meets Video Captioning: Concept-Aware Representation Learning Does Matter
Bang-ju Yang
Tong Zhang
Yuexian Zou
CLIP
25
20
0
30 Nov 2021
Neural Attention for Image Captioning: Review of Outstanding Methods
Neural Attention for Image Captioning: Review of Outstanding Methods
Zanyar Zohourianshahzadi
Jugal Kalita
VLM
35
45
0
29 Nov 2021
ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic
  Arithmetic
ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic
Yoad Tewel
Yoav Shalev
Idan Schwartz
Lior Wolf
VLM
36
192
0
29 Nov 2021
SwinBERT: End-to-End Transformers with Sparse Attention for Video
  Captioning
SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning
Kevin Qinghong Lin
Linjie Li
Chung-Ching Lin
Faisal Ahmed
Zhe Gan
Zicheng Liu
Yumao Lu
Lijuan Wang
ViT
19
236
0
25 Nov 2021
Less is More: Generating Grounded Navigation Instructions from Landmarks
Less is More: Generating Grounded Navigation Instructions from Landmarks
Su Wang
Ceslee Montgomery
Jordi Orbay
Vighnesh Birodkar
Aleksandra Faust
Izzeddin Gur
Natasha Jaques
Austin Waters
Jason Baldridge
Peter Anderson
25
63
0
25 Nov 2021
Generating More Pertinent Captions by Leveraging Semantics and Style on
  Multi-Source Datasets
Generating More Pertinent Captions by Leveraging Semantics and Style on Multi-Source Datasets
Marcella Cornia
Lorenzo Baraldi
G. Fiameni
Rita Cucchiara
22
12
0
24 Nov 2021
Hierarchical Modular Network for Video Captioning
Hierarchical Modular Network for Video Captioning
Hanhua Ye
Guorong Li
Yuankai Qi
Shuhui Wang
Qingming Huang
Ming-Hsuan Yang
27
67
0
24 Nov 2021
UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language
  Modeling
UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language Modeling
Zhengyuan Yang
Zhe Gan
Jianfeng Wang
Xiaowei Hu
Faisal Ahmed
Zicheng Liu
Yumao Lu
Lijuan Wang
34
111
0
23 Nov 2021
RedCaps: web-curated image-text data created by the people, for the
  people
RedCaps: web-curated image-text data created by the people, for the people
Karan Desai
Gaurav Kaul
Zubin Aysola
Justin Johnson
31
162
0
22 Nov 2021
L-Verse: Bidirectional Generation Between Image and Text
L-Verse: Bidirectional Generation Between Image and Text
Taehoon Kim
Gwangmo Song
Sihaeng Lee
Sangyun Kim
Yewon Seo
Soonyoung Lee
S. Kim
Honglak Lee
Kyunghoon Bae
31
25
0
22 Nov 2021
DVCFlow: Modeling Information Flow Towards Human-like Video Captioning
DVCFlow: Modeling Information Flow Towards Human-like Video Captioning
Xu Yan
Zhengcong Fei
Shuhui Wang
Qingming Huang
Qi Tian
VGen
40
4
0
19 Nov 2021
UFO: A UniFied TransfOrmer for Vision-Language Representation Learning
UFO: A UniFied TransfOrmer for Vision-Language Representation Learning
Jianfeng Wang
Xiaowei Hu
Zhe Gan
Zhengyuan Yang
Xiyang Dai
Zicheng Liu
Yumao Lu
Lijuan Wang
ViT
31
57
0
19 Nov 2021
ClipCap: CLIP Prefix for Image Captioning
ClipCap: CLIP Prefix for Image Captioning
Ron Mokady
Amir Hertz
Amit H. Bermano
CLIP
VLM
28
658
0
18 Nov 2021
Transparent Human Evaluation for Image Captioning
Transparent Human Evaluation for Image Captioning
Jungo Kasai
Keisuke Sakaguchi
Lavinia Dunagan
Jacob Morrison
Ronan Le Bras
Yejin Choi
Noah A. Smith
33
47
0
17 Nov 2021
EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained
  Embedding Matching
EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching
Yaya Shi
Xu Yang
Haiyang Xu
Chunfen Yuan
Bing Li
Weiming Hu
Zhengjun Zha
39
33
0
17 Nov 2021
Co-segmentation Inspired Attention Module for Video-based Computer
  Vision Tasks
Co-segmentation Inspired Attention Module for Video-based Computer Vision Tasks
Arulkumar Subramaniam
Jayesh Vaidya
Muhammed Ameen
Athira M. Nambiar
Anurag Mittal
30
7
0
14 Nov 2021
Visual Intelligence through Human Interaction
Visual Intelligence through Human Interaction
Ranjay Krishna
Mitchell L. Gordon
Fei-Fei Li
Michael S. Bernstein
29
8
0
12 Nov 2021
The Curious Layperson: Fine-Grained Image Recognition without Expert
  Labels
The Curious Layperson: Fine-Grained Image Recognition without Expert Labels
Subhabrata Choudhury
Iro Laina
Christian Rupprecht
Andrea Vedaldi
VLM
38
9
0
05 Nov 2021
Transparency of Deep Neural Networks for Medical Image Analysis: A
  Review of Interpretability Methods
Transparency of Deep Neural Networks for Medical Image Analysis: A Review of Interpretability Methods
Zohaib Salahuddin
Henry C. Woodruff
A. Chatterjee
Philippe Lambin
29
306
0
01 Nov 2021
EventNarrative: A large-scale Event-centric Dataset for Knowledge
  Graph-to-Text Generation
EventNarrative: A large-scale Event-centric Dataset for Knowledge Graph-to-Text Generation
Anthony Colas
A. Sadeghian
Yue Wang
D. Wang
23
21
0
30 Oct 2021
Automatic Knowledge Augmentation for Generative Commonsense Reasoning
Automatic Knowledge Augmentation for Generative Commonsense Reasoning
Jaehyung Seo
Chanjun Park
Sugyeong Eo
Hyeonseok Moon
Heuiseok Lim
ReLM
LRM
16
3
0
30 Oct 2021
Discovering Non-monotonic Autoregressive Orderings with Variational
  Inference
Discovering Non-monotonic Autoregressive Orderings with Variational Inference
Xuanlin Li
Brandon Trabucco
Dongmin Park
Michael Luo
S. Shen
Trevor Darrell
Yang Gao
27
12
0
27 Oct 2021
Bangla Image Caption Generation through CNN-Transformer based
  Encoder-Decoder Network
Bangla Image Caption Generation through CNN-Transformer based Encoder-Decoder Network
Yuansan Liu
MD Abdullah Al Nasim
Sourav Saha
Faria Afrin
Raisa Mallik
Sathishkumar Samiappan
ViT
16
11
0
24 Oct 2021
Exploiting Cross-Modal Prediction and Relation Consistency for
  Semi-Supervised Image Captioning
Exploiting Cross-Modal Prediction and Relation Consistency for Semi-Supervised Image Captioning
Yang Yang
Haoran Wei
Hengshu Zhu
Dianhai Yu
Hui Xiong
Jian Yang
SSL
14
33
0
22 Oct 2021
Cortico-cerebellar networks as decoupling neural interfaces
Cortico-cerebellar networks as decoupling neural interfaces
J. Pemberton
E. Boven
Richard Apps
Rui Ponte Costa
35
6
0
21 Oct 2021
Better than Average: Paired Evaluation of NLP Systems
Better than Average: Paired Evaluation of NLP Systems
Maxime Peyrard
Wei Zhao
Steffen Eger
Robert West
ELM
21
24
0
20 Oct 2021
A Self-Explainable Stylish Image Captioning Framework via
  Multi-References
A Self-Explainable Stylish Image Captioning Framework via Multi-References
Chengxi Li
Brent Harrison
26
0
0
20 Oct 2021
R$^3$Net:Relation-embedded Representation Reconstruction Network for
  Change Captioning
R3^33Net:Relation-embedded Representation Reconstruction Network for Change Captioning
Yunbin Tu
Liang Li
C. Yan
Shengxiang Gao
Zhengtao Yu
39
22
0
20 Oct 2021
A Picture is Worth a Thousand Words: A Unified System for Diverse
  Captions and Rich Images Generation
A Picture is Worth a Thousand Words: A Unified System for Diverse Captions and Rich Images Generation
Yupan Huang
Bei Liu
Jianlong Fu
Yutong Lu
DiffM
30
5
0
19 Oct 2021
Unifying Multimodal Transformer for Bi-directional Image and Text
  Generation
Unifying Multimodal Transformer for Bi-directional Image and Text Generation
Yupan Huang
Hongwei Xue
Bei Liu
Yutong Lu
21
57
0
19 Oct 2021
BEAMetrics: A Benchmark for Language Generation Evaluation Evaluation
BEAMetrics: A Benchmark for Language Generation Evaluation Evaluation
Thomas Scialom
Felix Hill
30
7
0
18 Oct 2021
Think Before You Speak: Explicitly Generating Implicit Commonsense
  Knowledge for Response Generation
Think Before You Speak: Explicitly Generating Implicit Commonsense Knowledge for Response Generation
Pei Zhou
Karthik Gopalakrishnan
Behnam Hedayatnia
Seokhwan Kim
Jay Pujara
Xiang Ren
Yang Liu
Dilek Z. Hakkani-Tür
44
41
0
16 Oct 2021
A Good Prompt Is Worth Millions of Parameters: Low-resource Prompt-based
  Learning for Vision-Language Models
A Good Prompt Is Worth Millions of Parameters: Low-resource Prompt-based Learning for Vision-Language Models
Woojeong Jin
Yu Cheng
Yelong Shen
Weizhu Chen
Xiang Ren
VLM
VPVLM
MLLM
35
130
0
16 Oct 2021
Self-Annotated Training for Controllable Image Captioning
Self-Annotated Training for Controllable Image Captioning
Zhangzi Zhu
Tianlei Wang
Hong Qu
32
2
0
16 Oct 2021
Guiding Visual Question Generation
Guiding Visual Question Generation
Nihir Vedd
Zixu Wang
Marek Rei
Yishu Miao
Lucia Specia
89
23
0
15 Oct 2021
Diverse Audio Captioning via Adversarial Training
Diverse Audio Captioning via Adversarial Training
Xinhao Mei
Xubo Liu
Jianyuan Sun
Mark D. Plumbley
Wenwu Wang
DiffM
GAN
48
28
0
13 Oct 2021
CLIP4Caption: CLIP for Video Caption
CLIP4Caption: CLIP for Video Caption
Mingkang Tang
Zhanyu Wang
Zhenhua Liu
Fengyun Rao
Dian Li
Xiu Li
CLIP
VLM
35
150
0
13 Oct 2021
Improving the Performance of Automated Audio Captioning via Integrating
  the Acoustic and Semantic Information
Improving the Performance of Automated Audio Captioning via Integrating the Acoustic and Semantic Information
Zhongjie Ye
Helin Wang
Dongchao Yang
Yuexian Zou
40
27
0
12 Oct 2021
Semi-Autoregressive Image Captioning
Semi-Autoregressive Image Captioning
Xu Yan
Zhengcong Fei
Zekang Li
Shuhui Wang
Qingming Huang
Qi Tian
35
23
0
11 Oct 2021
CLIP4Caption ++: Multi-CLIP for Video Caption
CLIP4Caption ++: Multi-CLIP for Video Caption
Mingkang Tang
Zhanyu Wang
Zhaoyang Zeng
Feng Rao
Dian Li
VLM
CLIP
17
7
0
11 Oct 2021
Can Audio Captions Be Evaluated with Image Caption Metrics?
Can Audio Captions Be Evaluated with Image Caption Metrics?
Zelin Zhou
Zhiling Zhang
Xuenan Xu
Zeyu Xie
Mengyue Wu
Kenny Q. Zhu
30
43
0
10 Oct 2021
Toward a Human-Level Video Understanding Intelligence
Toward a Human-Level Video Understanding Intelligence
Y. Heo
Minsu Lee
Seongho Choi
Woo Suk Choi
Minjung Shin
Minjoon Jung
Jeh-Kwang Ryu
Byoung-Tak Zhang
19
0
0
08 Oct 2021
End-to-End Supermask Pruning: Learning to Prune Image Captioning Models
End-to-End Supermask Pruning: Learning to Prune Image Captioning Models
J. Tan
C. Chan
Joon Huang Chuah
VLM
54
16
0
07 Oct 2021
Is An Image Worth Five Sentences? A New Look into Semantics for
  Image-Text Matching
Is An Image Worth Five Sentences? A New Look into Semantics for Image-Text Matching
Ali Furkan Biten
Andrés Mafla
Lluís Gómez
Dimosthenis Karatzas
197
17
0
06 Oct 2021
Let there be a clock on the beach: Reducing Object Hallucination in
  Image Captioning
Let there be a clock on the beach: Reducing Object Hallucination in Image Captioning
Ali Furkan Biten
L. G. I. Bigorda
Dimosthenis Karatzas
102
57
0
04 Oct 2021
Audio Captioning Using Sound Event Detection
Audio Captioning Using Sound Event Detection
Aycsegul Ozkaya Eren
M. Sert
43
8
0
04 Oct 2021
Geometry Attention Transformer with Position-aware LSTMs for Image
  Captioning
Geometry Attention Transformer with Position-aware LSTMs for Image Captioning
Chi-Yin Wang
Yulin Shen
Luping Ji
ViT
52
49
0
01 Oct 2021
CrossCLR: Cross-modal Contrastive Learning For Multi-modal Video
  Representations
CrossCLR: Cross-modal Contrastive Learning For Multi-modal Video Representations
Mohammadreza Zolfaghari
Yi Zhu
Peter V. Gehler
Thomas Brox
137
127
0
30 Sep 2021
Geometry-Entangled Visual Semantic Transformer for Image Captioning
Geometry-Entangled Visual Semantic Transformer for Image Captioning
Ling Cheng
Wei Wei
Feida Zhu
Yong Liu
Chunyan Miao
ViT
21
3
0
29 Sep 2021
CIDEr-R: Robust Consensus-based Image Description Evaluation
CIDEr-R: Robust Consensus-based Image Description Evaluation
G. O. D. Santos
Esther Luna Colombini
Sandra Avila
47
30
0
28 Sep 2021
Previous
123...252627...414243
Next