ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1411.5726
  4. Cited By
CIDEr: Consensus-based Image Description Evaluation
v1v2 (latest)

CIDEr: Consensus-based Image Description Evaluation

20 November 2014
Ramakrishna Vedantam
C. L. Zitnick
Devi Parikh
ArXiv (abs)PDFHTML

Papers citing "CIDEr: Consensus-based Image Description Evaluation"

50 / 2,184 papers shown
Title
FlowEval: A Consensus-Based Dialogue Evaluation Framework Using Segment
  Act Flows
FlowEval: A Consensus-Based Dialogue Evaluation Framework Using Segment Act Flows
Jianqiao Zhao
Yanyang Li
Wanyu Du
Yangfeng Ji
Dong Yu
Michael R. Lyu
Liwei Wang
72
4
0
14 Feb 2022
I-Tuning: Tuning Frozen Language Models with Image for Lightweight Image
  Captioning
I-Tuning: Tuning Frozen Language Models with Image for Lightweight Image Captioning
Ziyang Luo
Zhipeng Hu
Yadong Xi
Rongsheng Zhang
Jing Ma
VLM
52
14
0
14 Feb 2022
Describing image focused in cognitive and visual details for visually
  impaired people: An approach to generating inclusive paragraphs
Describing image focused in cognitive and visual details for visually impaired people: An approach to generating inclusive paragraphs
Daniel Louzada Fernandes
Marcos Henrique Fonseca Ribeiro
F. Cerqueira
Michel Melo Silva
37
7
0
10 Feb 2022
Image Difference Captioning with Pre-training and Contrastive Learning
Image Difference Captioning with Pre-training and Contrastive Learning
Linli Yao
Weiying Wang
Qin Jin
SSLVLM
81
43
0
09 Feb 2022
DALL-Eval: Probing the Reasoning Skills and Social Biases of
  Text-to-Image Generation Models
DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generation Models
Jaemin Cho
Abhaysinh Zala
Joey Tianyi Zhou
ViT
258
193
0
08 Feb 2022
OFA: Unifying Architectures, Tasks, and Modalities Through a Simple
  Sequence-to-Sequence Learning Framework
OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
Peng Wang
An Yang
Rui Men
Junyang Lin
Shuai Bai
Zhikang Li
Jianxin Ma
Chang Zhou
Jingren Zhou
Hongxia Yang
MLLMObjD
258
884
0
07 Feb 2022
Webly Supervised Concept Expansion for General Purpose Vision Models
Webly Supervised Concept Expansion for General Purpose Vision Models
Amita Kamath
Christopher Clark
Tanmay Gupta
Eric Kolve
Derek Hoiem
Aniruddha Kembhavi
VLM
97
55
0
04 Feb 2022
Joint Speech Recognition and Audio Captioning
Joint Speech Recognition and Audio Captioning
Chaitanya Narisetty
E. Tsunoo
Xuankai Chang
Yosuke Kashiwagi
Michael Hentschel
Shinji Watanabe
47
10
0
03 Feb 2022
Deep Learning Approaches on Image Captioning: A Review
Deep Learning Approaches on Image Captioning: A Review
Taraneh Ghandi
H. Pourreza
H. Mahyar
VLM
136
101
0
31 Jan 2022
A Frustratingly Simple Approach for End-to-End Image Captioning
A Frustratingly Simple Approach for End-to-End Image Captioning
Ziyang Luo
Yadong Xi
Rongsheng Zhang
Jing Ma
VLMMLLM
79
16
0
30 Jan 2022
BERTHA: Video Captioning Evaluation Via Transfer-Learned Human
  Assessment
BERTHA: Video Captioning Evaluation Via Transfer-Learned Human Assessment
Luis Lebron
Yvette Graham
Kevin McGuinness
K. Kouramas
Noel E. O'Connor
89
3
0
25 Jan 2022
Transformers in Medical Imaging: A Survey
Transformers in Medical Imaging: A Survey
Fahad Shamshad
Salman Khan
Syed Waqas Zamir
Muhammad Haris Khan
Munawar Hayat
Fahad Shahbaz Khan
Huazhu Fu
ViTLM&MAMedIm
197
712
0
24 Jan 2022
Improving Chest X-Ray Report Generation by Leveraging Warm Starting
Improving Chest X-Ray Report Generation by Leveraging Warm Starting
Aaron Nicolson
Jason Dowling
Bevan Koopman
ViTLM&MAMedIm
101
97
0
24 Jan 2022
WIDAR -- Weighted Input Document Augmented ROUGE
WIDAR -- Weighted Input Document Augmented ROUGE
Raghav Jain
Vaibhav Mavi
Anubhav Jangra
S. Saha
59
4
0
23 Jan 2022
End-to-end Generative Pretraining for Multimodal Video Captioning
End-to-end Generative Pretraining for Multimodal Video Captioning
Paul Hongsuck Seo
Arsha Nagrani
Anurag Arnab
Cordelia Schmid
76
170
0
20 Jan 2022
Instance-aware Prompt Learning for Language Understanding and Generation
Instance-aware Prompt Learning for Language Understanding and Generation
Feihu Jin
Jinliang Lu
Jiajun Zhang
Chengqing Zong
57
33
0
18 Jan 2022
What Makes the Story Forward? Inferring Commonsense Explanations as
  Prompts for Future Event Generation
What Makes the Story Forward? Inferring Commonsense Explanations as Prompts for Future Event Generation
Li Lin
Yixin Cao
Lifu Huang
Shuang Li
Xuming Hu
Lijie Wen
Jianmin Wang
AI4TS
79
16
0
18 Jan 2022
Prior Knowledge Enhances Radiology Report Generation
Prior Knowledge Enhances Radiology Report Generation
Song Wang
Liyan Tang
Mingquan Lin
George Shih
Ying Ding
Yifan Peng
MedIm
65
24
0
11 Jan 2022
Local Information Assisted Attention-free Decoder for Audio Captioning
Local Information Assisted Attention-free Decoder for Audio Captioning
Feiyang Xiao
Jian Guan
Haiyan Lan
Qiaoxi Zhu
Wenwu Wang
98
11
0
10 Jan 2022
Glance and Focus Networks for Dynamic Visual Recognition
Glance and Focus Networks for Dynamic Visual Recognition
Gao Huang
Yulin Wang
Kangchen Lv
Haojun Jiang
Wenhui Huang
Pengfei Qi
S. Song
3DH
150
50
0
09 Jan 2022
Compact Bidirectional Transformer for Image Captioning
Compact Bidirectional Transformer for Image Captioning
Yuanen Zhou
Zhenzhen Hu
Daqing Liu
Huixia Ben
Meng Wang
VLM
67
17
0
06 Jan 2022
All You Need In Sign Language Production
All You Need In Sign Language Production
R. Rastgoo
Kourosh Kiani
Sergio Escalera
V. Athitsos
Mohammad Sabokrou
51
8
0
05 Jan 2022
StyleM: Stylized Metrics for Image Captioning Built with Contrastive
  N-grams
StyleM: Stylized Metrics for Image Captioning Built with Contrastive N-grams
Chengxi Li
Brent Harrison
110
3
0
04 Jan 2022
Radiology Report Generation with a Learned Knowledge Base and
  Multi-modal Alignment
Radiology Report Generation with a Learned Knowledge Base and Multi-modal Alignment
Shuxin Yang
Xian Wu
Shen Ge
S.Kevin Zhou
Li Xiao
MedIm
70
97
0
30 Dec 2021
Knowledge Matters: Radiology Report Generation with General and Specific
  Knowledge
Knowledge Matters: Radiology Report Generation with General and Specific Knowledge
Shuxin Yang
Xian Wu
Shen Ge
S.Kevin Zhou
Li Xiao
MedIm
91
120
0
30 Dec 2021
Synchronized Audio-Visual Frames with Fractional Positional Encoding for
  Transformers in Video-to-Text Translation
Synchronized Audio-Visual Frames with Fractional Positional Encoding for Transformers in Video-to-Text Translation
Philipp Harzig
Moritz Einfalt
Rainer Lienhart
ViT
63
2
0
28 Dec 2021
Multimodal Image Synthesis and Editing: The Generative AI Era
Multimodal Image Synthesis and Editing: The Generative AI Era
Fangneng Zhan
Yingchen Yu
Rongliang Wu
Jiahui Zhang
Shijian Lu
Lingjie Liu
Adam Kortylewski
Christian Theobalt
Eric Xing
EGVM
198
51
0
27 Dec 2021
ScanQA: 3D Question Answering for Spatial Scene Understanding
ScanQA: 3D Question Answering for Spatial Scene Understanding
Daich Azuma
Taiki Miyanishi
Shuhei Kurita
M. Kawanabe
106
208
0
20 Dec 2021
NeuroLogic A*esque Decoding: Constrained Text Generation with Lookahead
  Heuristics
NeuroLogic A*esque Decoding: Constrained Text Generation with Lookahead Heuristics
Ximing Lu
Sean Welleck
Peter West
Liwei Jiang
Jungo Kasai
...
Lianhui Qin
Youngjae Yu
Rowan Zellers
Noah A. Smith
Yejin Choi
70
165
0
16 Dec 2021
Dense Video Captioning Using Unsupervised Semantic Information
Dense Video Captioning Using Unsupervised Semantic Information
Valter Estevam
Rayson Laroca
Hélio Pedrini
David Menotti
93
10
0
15 Dec 2021
KGR^4: Retrieval, Retrospect, Refine and Rethink for Commonsense
  Generation
KGR^4: Retrieval, Retrospect, Refine and Rethink for Commonsense Generation
Xin Liu
Dayiheng Liu
Baosong Yang
Haibo Zhang
Junwei Ding
Wenqing Yao
Weihua Luo
Haiying Zhang
Jinsong Su
LRM
61
8
0
15 Dec 2021
CoCo-BERT: Improving Video-Language Pre-training with Contrastive
  Cross-modal Matching and Denoising
CoCo-BERT: Improving Video-Language Pre-training with Contrastive Cross-modal Matching and Denoising
Jianjie Luo
Yehao Li
Yingwei Pan
Ting Yao
Hongyang Chao
Tao Mei
VLM
74
42
0
14 Dec 2021
MAGIC: Multimodal relAtional Graph adversarIal inferenCe for Diverse and
  Unpaired Text-based Image Captioning
MAGIC: Multimodal relAtional Graph adversarIal inferenCe for Diverse and Unpaired Text-based Image Captioning
Wenqiao Zhang
Haochen Shi
Jiannan Guo
Shengyu Zhang
Qingpeng Cai
Juncheng Li
Sihui Luo
Yueting Zhuang
DiffM
100
46
0
13 Dec 2021
Contextualized Scene Imagination for Generative Commonsense Reasoning
Contextualized Scene Imagination for Generative Commonsense Reasoning
Peifeng Wang
Jonathan Zamora
Junfeng Liu
Filip Ilievski
Muhao Chen
Xiang Ren
ReLMLRM
101
16
0
12 Dec 2021
Unified Multimodal Pre-training and Prompt-based Tuning for
  Vision-Language Understanding and Generation
Unified Multimodal Pre-training and Prompt-based Tuning for Vision-Language Understanding and Generation
Tianyi Liu
Zuxuan Wu
Wenhan Xiong
Jingjing Chen
Yu-Gang Jiang
VLMMLLM
88
10
0
10 Dec 2021
Injecting Semantic Concepts into End-to-End Image Captioning
Injecting Semantic Concepts into End-to-End Image Captioning
Zhiyuan Fang
Jianfeng Wang
Xiaowei Hu
Lin Liang
Zhe Gan
Lijuan Wang
Yezhou Yang
Zicheng Liu
ViTVLM
86
91
0
09 Dec 2021
Self-Supervised Image-to-Text and Text-to-Image Synthesis
Self-Supervised Image-to-Text and Text-to-Image Synthesis
Anindya Sundar Das
S. Saha
SSL
28
5
0
09 Dec 2021
Bidimensional Leaderboards: Generate and Evaluate Language Hand in Hand
Bidimensional Leaderboards: Generate and Evaluate Language Hand in Hand
Jungo Kasai
Keisuke Sakaguchi
Ronan Le Bras
Lavinia Dunagan
Jacob Morrison
Alexander R. Fabbri
Yejin Choi
Noah A. Smith
101
40
0
08 Dec 2021
Search and Learn: Improving Semantic Coverage for Data-to-Text
  Generation
Search and Learn: Improving Semantic Coverage for Data-to-Text Generation
Shailza Jolly
Zi Xuan Zhang
Andreas Dengel
Lili Mou
75
11
0
06 Dec 2021
InfoLM: A New Metric to Evaluate Summarization & Data2Text Generation
InfoLM: A New Metric to Evaluate Summarization & Data2Text Generation
Pierre Colombo
Chloe Clave
Pablo Piantanida
134
44
0
02 Dec 2021
D3Net: A Unified Speaker-Listener Architecture for 3D Dense Captioning
  and Visual Grounding
D3Net: A Unified Speaker-Listener Architecture for 3D Dense Captioning and Visual Grounding
Dave Zhenyu Chen
Qirui Wu
Matthias Nießner
Angel X. Chang
81
32
0
02 Dec 2021
Object-Centric Unsupervised Image Captioning
Object-Centric Unsupervised Image Captioning
Zihang Meng
David Yang
Xuefei Cao
Ashish Shah
Ser-Nam Lim
OCLVLM
80
12
0
02 Dec 2021
CLIP Meets Video Captioning: Concept-Aware Representation Learning Does
  Matter
CLIP Meets Video Captioning: Concept-Aware Representation Learning Does Matter
Bang-ju Yang
Tong Zhang
Yuexian Zou
CLIP
70
20
0
30 Nov 2021
Neural Attention for Image Captioning: Review of Outstanding Methods
Neural Attention for Image Captioning: Review of Outstanding Methods
Zanyar Zohourianshahzadi
Jugal Kalita
VLM
95
47
0
29 Nov 2021
ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic
  Arithmetic
ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic
Yoad Tewel
Yoav Shalev
Idan Schwartz
Lior Wolf
VLM
122
197
0
29 Nov 2021
SwinBERT: End-to-End Transformers with Sparse Attention for Video
  Captioning
SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning
Kevin Qinghong Lin
Linjie Li
Chung-Ching Lin
Faisal Ahmed
Zhe Gan
Zicheng Liu
Yumao Lu
Lijuan Wang
ViT
85
247
0
25 Nov 2021
Less is More: Generating Grounded Navigation Instructions from Landmarks
Less is More: Generating Grounded Navigation Instructions from Landmarks
Su Wang
Ceslee Montgomery
Jordi Orbay
Vighnesh Birodkar
Aleksandra Faust
Izzeddin Gur
Natasha Jaques
Austin Waters
Jason Baldridge
Peter Anderson
135
64
0
25 Nov 2021
Generating More Pertinent Captions by Leveraging Semantics and Style on
  Multi-Source Datasets
Generating More Pertinent Captions by Leveraging Semantics and Style on Multi-Source Datasets
Marcella Cornia
Lorenzo Baraldi
G. Fiameni
Rita Cucchiara
109
12
0
24 Nov 2021
Hierarchical Modular Network for Video Captioning
Hierarchical Modular Network for Video Captioning
Hanhua Ye
Guorong Li
Yuankai Qi
Shuhui Wang
Qingming Huang
Ming-Hsuan Yang
127
70
0
24 Nov 2021
UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language
  Modeling
UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language Modeling
Zhengyuan Yang
Zhe Gan
Jianfeng Wang
Xiaowei Hu
Faisal Ahmed
Zicheng Liu
Yumao Lu
Lijuan Wang
146
117
0
23 Nov 2021
Previous
123...252627...424344
Next