ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1411.5726
  4. Cited By
CIDEr: Consensus-based Image Description Evaluation
v1v2 (latest)

CIDEr: Consensus-based Image Description Evaluation

20 November 2014
Ramakrishna Vedantam
C. L. Zitnick
Devi Parikh
ArXiv (abs)PDFHTML

Papers citing "CIDEr: Consensus-based Image Description Evaluation"

50 / 2,184 papers shown
Title
Towards Efficient Fine-tuning of Pre-trained Code Models: An
  Experimental Study and Beyond
Towards Efficient Fine-tuning of Pre-trained Code Models: An Experimental Study and Beyond
Ensheng Shi
Yanlin Wang
Hongyu Zhang
Lun Du
Shi Han
Dongmei Zhang
Hongbin Sun
89
45
0
11 Apr 2023
SoccerNet-Caption: Dense Video Captioning for Soccer Broadcasts
  Commentaries
SoccerNet-Caption: Dense Video Captioning for Soccer Broadcasts Commentaries
Hassan Mkhallati
A. Cioppa
Silvio Giancola
Guohao Li
Marc Van Droogenbroeck
77
34
0
10 Apr 2023
WebBrain: Learning to Generate Factually Correct Articles for Queries by
  Grounding on Large Web Corpus
WebBrain: Learning to Generate Factually Correct Articles for Queries by Grounding on Large Web Corpus
Hongjing Qian
Yutao Zhu
Zhicheng Dou
Haoqi Gu
Xinyu Zhang
Zheng Liu
Ruofei Lai
Bo Zhao
J. Nie
Ji-Rong Wen
81
26
0
10 Apr 2023
Decoder-Only or Encoder-Decoder? Interpreting Language Model as a
  Regularized Encoder-Decoder
Decoder-Only or Encoder-Decoder? Interpreting Language Model as a Regularized Encoder-Decoder
Z. Fu
W. Lam
Qian Yu
Anthony Man-Cho So
Shengding Hu
Zhiyuan Liu
Nigel Collier
AuLLM
69
44
0
08 Apr 2023
Model-Agnostic Gender Debiased Image Captioning
Model-Agnostic Gender Debiased Image Captioning
Yusuke Hirota
Yuta Nakashima
Noa Garcia
FaML
122
18
0
07 Apr 2023
Graph Attention for Automated Audio Captioning
Graph Attention for Automated Audio Captioning
Feiyang Xiao
Jian Guan
Qiaoxi Zhu
Wenwu Wang
66
8
0
07 Apr 2023
Improving Visual Question Answering Models through Robustness Analysis
  and In-Context Learning with a Chain of Basic Questions
Improving Visual Question Answering Models through Robustness Analysis and In-Context Learning with a Chain of Basic Questions
Jia-Hong Huang
Modar Alfadly
Guohao Li
Marcel Worring
OODAAML
87
6
0
06 Apr 2023
METransformer: Radiology Report Generation by Transformer with Multiple
  Learnable Expert Tokens
METransformer: Radiology Report Generation by Transformer with Multiple Learnable Expert Tokens
Zhanyu Wang
Lingqiao Liu
Lei Wang
Luping Zhou
MedIm
77
76
0
05 Apr 2023
Cross-Domain Image Captioning with Discriminative Finetuning
Cross-Domain Image Captioning with Discriminative Finetuning
Roberto Dessì
Michele Bevilacqua
Eleonora Gualdoni
Nathanaël Carraz Rakotonirina
Francesca Franzon
Marco Baroni
CLIP
101
19
0
04 Apr 2023
Changes to Captions: An Attentive Network for Remote Sensing Change
  Captioning
Changes to Captions: An Attentive Network for Remote Sensing Change Captioning
Shizhen Chang
Pedram Ghamisi
95
48
0
03 Apr 2023
Prefix tuning for automated audio captioning
Prefix tuning for automated audio captioning
Minkyu Kim
Kim Sung-Bin
Tae-Hyun Oh
104
46
0
30 Mar 2023
WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for
  Audio-Language Multimodal Research
WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research
Xinhao Mei
Chutong Meng
Haohe Liu
Qiuqiang Kong
Tom Ko
Chengqi Zhao
Mark D. Plumbley
Yuexian Zou
Wenwu Wang
181
220
0
30 Mar 2023
A Study of Autoregressive Decoders for Multi-Tasking in Computer Vision
A Study of Autoregressive Decoders for Multi-Tasking in Computer Vision
Lucas Beyer
Bo Wan
Gagan Madan
Filip Pavetić
Andreas Steiner
...
Emanuele Bugliarello
Tianlin Li
Qihang Yu
Liang-Chieh Chen
Xiaohua Zhai
130
9
0
30 Mar 2023
AutoAD: Movie Description in Context
AutoAD: Movie Description in Context
Tengda Han
Max Bain
Arsha Nagrani
Gül Varol
Weidi Xie
Andrew Zisserman
VGen
77
35
0
29 Mar 2023
Hierarchical Video-Moment Retrieval and Step-Captioning
Hierarchical Video-Moment Retrieval and Step-Captioning
Abhaysinh Zala
Jaemin Cho
Satwik Kottur
Xilun Chen
Barlas Ouguz
Yasher Mehdad
Joey Tianyi Zhou
3DV
98
54
0
29 Mar 2023
Exposing and Addressing Cross-Task Inconsistency in Unified
  Vision-Language Models
Exposing and Addressing Cross-Task Inconsistency in Unified Vision-Language Models
A. Maharana
Amita Kamath
Christopher Clark
Joey Tianyi Zhou
Aniruddha Kembhavi
85
3
0
28 Mar 2023
Unify, Align and Refine: Multi-Level Semantic Alignment for Radiology
  Report Generation
Unify, Align and Refine: Multi-Level Semantic Alignment for Radiology Report Generation
Yaowei Li
Bang-ju Yang
Xuxin Cheng
Zhihong Zhu
Hongxiang Li
Yuexian Zou
73
33
0
28 Mar 2023
Fine-grained Audible Video Description
Fine-grained Audible Video Description
Xuyang Shen
Dong Li
Jinxing Zhou
Zhen Qin
Bowen He
...
Yuchao Dai
Lingpeng Kong
Meng Wang
Yu Qiao
Yiran Zhong
VGen
94
11
0
27 Mar 2023
SEM-POS: Grammatically and Semantically Correct Video Captioning
SEM-POS: Grammatically and Semantically Correct Video Captioning
Asmar Nadeem
A. Hilton
R. Dawes
Graham A. Thomas
A. Mustafa
73
8
0
26 Mar 2023
GOAL: A Challenging Knowledge-grounded Video Captioning Benchmark for
  Real-time Soccer Commentary Generation
GOAL: A Challenging Knowledge-grounded Video Captioning Benchmark for Real-time Soccer Commentary Generation
Ji Qi
Jifan Yu
Teng Tu
Kunyu Gao
Yifan Xu
...
Juanzi Li
Jie Tang
Weidong Guo
Hui Liu
Yu-Syuan Xu
78
20
0
26 Mar 2023
VILA: Learning Image Aesthetics from User Comments with Vision-Language
  Pretraining
VILA: Learning Image Aesthetics from User Comments with Vision-Language Pretraining
Junjie Ke
Keren Ye
Jiahui Yu
Yonghui Wu
P. Milanfar
Feng Yang
VLM
102
61
0
24 Mar 2023
MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation
  Models
MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models
Dohwan Ko
Joon-Young Choi
Hyeong Kyu Choi
Kyoung-Woon On
Byungseok Roh
Hyunwoo J. Kim
121
23
0
23 Mar 2023
Text with Knowledge Graph Augmented Transformer for Video Captioning
Text with Knowledge Graph Augmented Transformer for Video Captioning
Xin Gu
G. Chen
Yufei Wang
Libo Zhang
Tiejian Luo
Longyin Wen
123
51
0
22 Mar 2023
Positive-Augmented Contrastive Learning for Image and Video Captioning
  Evaluation
Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation
Sara Sarto
Manuele Barraco
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
86
60
0
21 Mar 2023
VideoXum: Cross-modal Visual and Textural Summarization of Videos
VideoXum: Cross-modal Visual and Textural Summarization of Videos
Jingyang Lin
Hang Hua
Ming Chen
Yikang Li
Jenhao Hsiao
C. Ho
Jiebo Luo
109
33
0
21 Mar 2023
TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation
  with Question Answering
TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering
Yushi Hu
Benlin Liu
Jungo Kasai
Yizhong Wang
Mari Ostendorf
Ranjay Krishna
Noah A. Smith
EGVM
90
239
0
21 Mar 2023
Multi-modal reward for visual relationships-based image captioning
Multi-modal reward for visual relationships-based image captioning
Ali Abedi
Hossein Karshenas
Peyman Adibi
131
2
0
19 Mar 2023
SPDF: Sparse Pre-training and Dense Fine-tuning for Large Language
  Models
SPDF: Sparse Pre-training and Dense Fine-tuning for Large Language Models
Vithursan Thangarasa
Abhay Gupta
William Marshall
Tianda Li
Kevin Leong
D. DeCoste
Sean Lie
Shreyas Saxena
MoEAI4CE
83
22
0
18 Mar 2023
Dynamic Graph Enhanced Contrastive Learning for Chest X-ray Report
  Generation
Dynamic Graph Enhanced Contrastive Learning for Chest X-ray Report Generation
Mingjie Li
Bingqian Lin
Zicong Chen
Haokun Lin
Xiaodan Liang
Xiaojun Chang
MedIm
82
117
0
18 Mar 2023
GNNFormer: A Graph-based Framework for Cytopathology Report Generation
GNNFormer: A Graph-based Framework for Cytopathology Report Generation
Yangqiaoyu Zhou
Kai-Lang Yao
Wusuo Li
MedIm
51
1
0
17 Mar 2023
CHAMPAGNE: Learning Real-world Conversation from Large-Scale Web Videos
CHAMPAGNE: Learning Real-world Conversation from Large-Scale Web Videos
Seungju Han
Jack Hessel
Nouha Dziri
Yejin Choi
Youngjae Yu
VGen
94
19
0
17 Mar 2023
Cross-Modal Causal Intervention for Medical Report Generation
Cross-Modal Causal Intervention for Medical Report Generation
Weixing Chen
Yang-Yang Liu
Ce Wang
Jiarui Zhu
Shen Zhao
Guanbin Li
Cheng-Lin Liu
82
7
0
16 Mar 2023
Lana: A Language-Capable Navigator for Instruction Following and
  Generation
Lana: A Language-Capable Navigator for Instruction Following and Generation
Xiaohan Wang
Wenguan Wang
Jiayi Shao
Yi Yang
LLMAGLM&Ro
98
41
0
15 Mar 2023
PR-MCS: Perturbation Robust Metric for MultiLingual Image Captioning
PR-MCS: Perturbation Robust Metric for MultiLingual Image Captioning
Yongil Kim
Yerin Hwang
Hyeongu Yun
Seunghyun Yoon
Trung Bui
Kyomin Jung
70
6
0
15 Mar 2023
Implicit and Explicit Commonsense for Multi-sentence Video Captioning
Implicit and Explicit Commonsense for Multi-sentence Video Captioning
Shih-Han Chou
James J. Little
Leonid Sigal
69
2
0
14 Mar 2023
Breaking Common Sense: WHOOPS! A Vision-and-Language Benchmark of
  Synthetic and Compositional Images
Breaking Common Sense: WHOOPS! A Vision-and-Language Benchmark of Synthetic and Compositional Images
Nitzan Bitton-Guetta
Yonatan Bitton
Jack Hessel
Ludwig Schmidt
Yuval Elovici
Gabriel Stanovsky
Roy Schwartz
VLM
226
70
0
13 Mar 2023
ChatGPT Asks, BLIP-2 Answers: Automatic Questioning Towards Enriched
  Visual Descriptions
ChatGPT Asks, BLIP-2 Answers: Automatic Questioning Towards Enriched Visual Descriptions
Deyao Zhu
Jun Chen
Kilichbek Haydarov
Xiaoqian Shen
Wenxuan Zhang
Mohamed Elhoseiny
MLLM
100
106
0
12 Mar 2023
ZeroNLG: Aligning and Autoencoding Domains for Zero-Shot Multimodal and
  Multilingual Natural Language Generation
ZeroNLG: Aligning and Autoencoding Domains for Zero-Shot Multimodal and Multilingual Natural Language Generation
Bang-ju Yang
Fenglin Liu
Yuexian Zou
Xian Wu
Yaowei Wang
David Clifton
88
9
0
11 Mar 2023
Learning Combinatorial Prompts for Universal Controllable Image
  Captioning
Learning Combinatorial Prompts for Universal Controllable Image Captioning
Zhen Wang
Jun Xiao
Yueting Zhuang
Fei Gao
Jian Shao
Long Chen
112
5
0
11 Mar 2023
Interpretable Visual Question Answering Referring to Outside Knowledge
Interpretable Visual Question Answering Referring to Outside Knowledge
He Zhu
Ren Togo
Takahiro Ogawa
Miki Haseyama
63
0
0
08 Mar 2023
Describe me an Aucklet: Generating Grounded Perceptual Category
  Descriptions
Describe me an Aucklet: Generating Grounded Perceptual Category Descriptions
Bill Noble
N. Ilinykh
114
0
0
07 Mar 2023
Graph Neural Networks in Vision-Language Image Understanding: A Survey
Graph Neural Networks in Vision-Language Image Understanding: A Survey
Henry Senior
Greg Slabaugh
Shanxin Yuan
Luca Rossi
GNN
92
21
0
07 Mar 2023
Neighborhood Contrastive Transformer for Change Captioning
Neighborhood Contrastive Transformer for Change Captioning
Yunbin Tu
Liang Li
Li Su
Kelvin Lu
Qin Huang
ViT
84
17
0
06 Mar 2023
DeCap: Decoding CLIP Latents for Zero-Shot Captioning via Text-Only
  Training
DeCap: Decoding CLIP Latents for Zero-Shot Captioning via Text-Only Training
Wei Li
Linchao Zhu
Longyin Wen
Yi Yang
VLM
110
89
0
06 Mar 2023
Models See Hallucinations: Evaluating the Factuality in Video Captioning
Models See Hallucinations: Evaluating the Factuality in Video Captioning
Hui Liu
Xiaojun Wan
HILM
63
11
0
06 Mar 2023
Comparative study of Transformer and LSTM Network with attention
  mechanism on Image Captioning
Comparative study of Transformer and LSTM Network with attention mechanism on Image Captioning
Pranav Dandwate
Chaitanya Shahane
V. Jagtap
Shridevi C. Karande
101
9
0
05 Mar 2023
Prismer: A Vision-Language Model with Multi-Task Experts
Prismer: A Vision-Language Model with Multi-Task Experts
Shikun Liu
Linxi Fan
Edward Johns
Zhiding Yu
Chaowei Xiao
Anima Anandkumar
VLMMLLM
142
25
0
04 Mar 2023
FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion
  Tasks
FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks
Xiaoping Han
Xiatian Zhu
Licheng Yu
Li Zhang
Yi-Zhe Song
Tao Xiang
VLM
87
45
0
04 Mar 2023
ConZIC: Controllable Zero-shot Image Captioning by Sampling-Based
  Polishing
ConZIC: Controllable Zero-shot Image Captioning by Sampling-Based Polishing
Zequn Zeng
Hao Zhang
Zhengjue Wang
Ruiying Lu
Dongsheng Wang
Bo Chen
BDLDiffM
61
33
0
04 Mar 2023
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense
  Video Captioning
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
Antoine Yang
Arsha Nagrani
Paul Hongsuck Seo
Antoine Miech
Jordi Pont-Tuset
Ivan Laptev
Josef Sivic
Cordelia Schmid
AI4TSVLM
175
242
0
27 Feb 2023
Previous
123...181920...424344
Next