ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1411.5726
  4. Cited By
CIDEr: Consensus-based Image Description Evaluation

CIDEr: Consensus-based Image Description Evaluation

20 November 2014
Ramakrishna Vedantam
C. L. Zitnick
Devi Parikh
ArXivPDFHTML

Papers citing "CIDEr: Consensus-based Image Description Evaluation"

50 / 2,152 papers shown
Title
Multimodal Data Augmentation for Image Captioning using Diffusion Models
Multimodal Data Augmentation for Image Captioning using Diffusion Models
Changrong Xiao
S. Xu
Kunpeng Zhang
DiffM
47
10
0
03 May 2023
Multitask learning in Audio Captioning: a sentence embedding regression
  loss acts as a regularizer
Multitask learning in Audio Captioning: a sentence embedding regression loss acts as a regularizer
Etienne Labbé
J. Pinquier
Thomas Pellegrini
55
5
0
02 May 2023
VPGTrans: Transfer Visual Prompt Generator across LLMs
VPGTrans: Transfer Visual Prompt Generator across LLMs
Ao Zhang
Hao Fei
Yuan Yao
Wei Ji
Li Li
Zhiyuan Liu
Tat-Seng Chua
MLLM
VLM
45
86
0
02 May 2023
Quality-agnostic Image Captioning to Safely Assist People with Vision
  Impairment
Quality-agnostic Image Captioning to Safely Assist People with Vision Impairment
Lu Yu
Malvina Nikandrou
Jiali Jin
Verena Rieser
52
5
0
28 Apr 2023
ChartSumm: A Comprehensive Benchmark for Automatic Chart Summarization
  of Long and Short Summaries
ChartSumm: A Comprehensive Benchmark for Automatic Chart Summarization of Long and Short Summaries
Raian Rahman
Rizvi Hasan
Abdullah Al Farhad
Md Tahmid Rahman Laskar
Md. Hamjajul Ashmafee
A. Kamal
34
24
0
26 Apr 2023
From Association to Generation: Text-only Captioning by Unsupervised
  Cross-modal Mapping
From Association to Generation: Text-only Captioning by Unsupervised Cross-modal Mapping
Junyan Wang
Ming Yan
Yi Zhang
Jitao Sang
CLIP
VLM
29
8
0
26 Apr 2023
Towards Medical Artificial General Intelligence via Knowledge-Enhanced
  Multimodal Pretraining
Towards Medical Artificial General Intelligence via Knowledge-Enhanced Multimodal Pretraining
Bingqian Lin
Zicong Chen
Mingjie Li
Haokun Lin
Hang Xu
...
Ling-Hao Chen
Xiaojun Chang
Yi Yang
L. Xing
Xiaodan Liang
LM&MA
MedIm
AI4CE
64
14
0
26 Apr 2023
A Review of Deep Learning for Video Captioning
A Review of Deep Learning for Video Captioning
Moloud Abdar
Meenakshi Kollati
Swaraja Kuraparthi
Farhad Pourpanah
Daniel J. McDuff
...
Shuicheng Yan
Abduallah A. Mohamed
Abbas Khosravi
Min Zhang
Fatih Porikli
3DV
47
21
0
22 Apr 2023
Image-text Retrieval via Preserving Main Semantics of Vision
Image-text Retrieval via Preserving Main Semantics of Vision
Xu Zhang
Xinzheng Niu
Philippe Fournier-Viger
Xudong Dai
VLM
21
5
0
20 Apr 2023
MPMQA: Multimodal Question Answering on Product Manuals
MPMQA: Multimodal Question Answering on Product Manuals
Liangfu Zhang
Anwen Hu
Jing Zhang
Shuo Hu
Qin Jin
35
9
0
19 Apr 2023
TTIDA: Controllable Generative Data Augmentation via Text-to-Text and
  Text-to-Image Models
TTIDA: Controllable Generative Data Augmentation via Text-to-Text and Text-to-Image Models
Yuwei Yin
Jean Kaddour
Xiang Zhang
Yixin Nie
Zhenguang Liu
Lingpeng Kong
Qi Liu
DiffM
44
10
0
18 Apr 2023
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
Sihan Chen
Xingjian He
Longteng Guo
Xinxin Zhu
Weining Wang
Jinhui Tang
Jinhui Tang
VLM
45
107
0
17 Apr 2023
Interactive and Explainable Region-guided Radiology Report Generation
Interactive and Explainable Region-guided Radiology Report Generation
Tim Tanida
Philip Muller
Georgios Kaissis
Daniel Rueckert
MedIm
76
112
0
17 Apr 2023
Tractable Control for Autoregressive Language Generation
Tractable Control for Autoregressive Language Generation
Honghua Zhang
Meihua Dang
Nanyun Peng
Guy Van den Broeck
BDL
43
40
0
15 Apr 2023
Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved with
  Text
Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved with Text
Wanrong Zhu
Jack Hessel
Anas Awadalla
S. Gadre
Jesse Dodge
Alex Fang
Youngjae Yu
Ludwig Schmidt
William Yang Wang
Yejin Choi
VLM
42
170
0
14 Apr 2023
A-CAP: Anticipation Captioning with Commonsense Knowledge
A-CAP: Anticipation Captioning with Commonsense Knowledge
D. Vo
Quoc-An Luong
Akihiro Sugimoto
Hideki Nakayama
37
2
0
13 Apr 2023
CLIP-Guided Vision-Language Pre-training for Question Answering in 3D
  Scenes
CLIP-Guided Vision-Language Pre-training for Question Answering in 3D Scenes
Maria Parelli
Alexandros Delitzas
Nikolas Hars
G. Vlassis
Sotiris Anagnostidis
Gregor Bachmann
Thomas Hofmann
CLIP
26
50
0
12 Apr 2023
HRS-Bench: Holistic, Reliable and Scalable Benchmark for Text-to-Image
  Models
HRS-Bench: Holistic, Reliable and Scalable Benchmark for Text-to-Image Models
Eslam Mohamed Bakr
Pengzhan Sun
Xiaoqian Shen
Faizan Farooq Khan
Li Erran Li
Mohamed Elhoseiny
VLM
40
77
0
11 Apr 2023
Towards Efficient Fine-tuning of Pre-trained Code Models: An
  Experimental Study and Beyond
Towards Efficient Fine-tuning of Pre-trained Code Models: An Experimental Study and Beyond
Ensheng Shi
Yanlin Wang
Hongyu Zhang
Lun Du
Shi Han
Dongmei Zhang
Hongbin Sun
50
43
0
11 Apr 2023
SoccerNet-Caption: Dense Video Captioning for Soccer Broadcasts
  Commentaries
SoccerNet-Caption: Dense Video Captioning for Soccer Broadcasts Commentaries
Hassan Mkhallati
A. Cioppa
Silvio Giancola
Guohao Li
Marc Van Droogenbroeck
35
34
0
10 Apr 2023
WebBrain: Learning to Generate Factually Correct Articles for Queries by
  Grounding on Large Web Corpus
WebBrain: Learning to Generate Factually Correct Articles for Queries by Grounding on Large Web Corpus
Hongjing Qian
Yutao Zhu
Zhicheng Dou
Haoqi Gu
Xinyu Zhang
Zheng Liu
Ruofei Lai
Bo Zhao
J. Nie
Ji-Rong Wen
43
25
0
10 Apr 2023
Decoder-Only or Encoder-Decoder? Interpreting Language Model as a
  Regularized Encoder-Decoder
Decoder-Only or Encoder-Decoder? Interpreting Language Model as a Regularized Encoder-Decoder
Z. Fu
W. Lam
Qian Yu
Anthony Man-Cho So
Shengding Hu
Zhiyuan Liu
Nigel Collier
AuLLM
42
41
0
08 Apr 2023
Model-Agnostic Gender Debiased Image Captioning
Model-Agnostic Gender Debiased Image Captioning
Yusuke Hirota
Yuta Nakashima
Noa Garcia
FaML
58
18
0
07 Apr 2023
Graph Attention for Automated Audio Captioning
Graph Attention for Automated Audio Captioning
Feiyang Xiao
Jian Guan
Qiaoxi Zhu
Wenwu Wang
40
8
0
07 Apr 2023
Improving Visual Question Answering Models through Robustness Analysis
  and In-Context Learning with a Chain of Basic Questions
Improving Visual Question Answering Models through Robustness Analysis and In-Context Learning with a Chain of Basic Questions
Jia-Hong Huang
Modar Alfadly
Guohao Li
Marcel Worring
OOD
AAML
57
5
0
06 Apr 2023
METransformer: Radiology Report Generation by Transformer with Multiple
  Learnable Expert Tokens
METransformer: Radiology Report Generation by Transformer with Multiple Learnable Expert Tokens
Zhanyu Wang
Lingqiao Liu
Lei Wang
Luping Zhou
MedIm
31
75
0
05 Apr 2023
Cross-Domain Image Captioning with Discriminative Finetuning
Cross-Domain Image Captioning with Discriminative Finetuning
Roberto Dessì
Michele Bevilacqua
Eleonora Gualdoni
Nathanaël Carraz Rakotonirina
Francesca Franzon
Marco Baroni
CLIP
44
19
0
04 Apr 2023
Changes to Captions: An Attentive Network for Remote Sensing Change
  Captioning
Changes to Captions: An Attentive Network for Remote Sensing Change Captioning
Shizhen Chang
Pedram Ghamisi
35
43
0
03 Apr 2023
Prefix tuning for automated audio captioning
Prefix tuning for automated audio captioning
Minkyu Kim
Kim Sung-Bin
Tae-Hyun Oh
35
43
0
30 Mar 2023
WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for
  Audio-Language Multimodal Research
WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research
Xinhao Mei
Chutong Meng
Haohe Liu
Qiuqiang Kong
Tom Ko
Chengqi Zhao
Mark D. Plumbley
Yuexian Zou
Wenwu Wang
60
202
0
30 Mar 2023
A Study of Autoregressive Decoders for Multi-Tasking in Computer Vision
A Study of Autoregressive Decoders for Multi-Tasking in Computer Vision
Lucas Beyer
Bo Wan
Gagan Madan
Filip Pavetić
Andreas Steiner
...
Emanuele Bugliarello
Tianlin Li
Qihang Yu
Liang-Chieh Chen
Xiaohua Zhai
67
8
0
30 Mar 2023
AutoAD: Movie Description in Context
AutoAD: Movie Description in Context
Tengda Han
Max Bain
Arsha Nagrani
Gül Varol
Weidi Xie
Andrew Zisserman
VGen
47
35
0
29 Mar 2023
Hierarchical Video-Moment Retrieval and Step-Captioning
Hierarchical Video-Moment Retrieval and Step-Captioning
Abhaysinh Zala
Jaemin Cho
Satwik Kottur
Xilun Chen
Barlas Ouguz
Yasher Mehdad
Joey Tianyi Zhou
3DV
50
52
0
29 Mar 2023
Exposing and Addressing Cross-Task Inconsistency in Unified
  Vision-Language Models
Exposing and Addressing Cross-Task Inconsistency in Unified Vision-Language Models
A. Maharana
Amita Kamath
Christopher Clark
Joey Tianyi Zhou
Aniruddha Kembhavi
50
3
0
28 Mar 2023
Unify, Align and Refine: Multi-Level Semantic Alignment for Radiology
  Report Generation
Unify, Align and Refine: Multi-Level Semantic Alignment for Radiology Report Generation
Yaowei Li
Bang-ju Yang
Xuxin Cheng
Zhihong Zhu
Hongxiang Li
Yuexian Zou
37
31
0
28 Mar 2023
Fine-grained Audible Video Description
Fine-grained Audible Video Description
Xuyang Shen
Dong Li
Jinxing Zhou
Zhen Qin
Bowen He
...
Yuchao Dai
Lingpeng Kong
Meng Wang
Yu Qiao
Yiran Zhong
VGen
68
11
0
27 Mar 2023
SEM-POS: Grammatically and Semantically Correct Video Captioning
SEM-POS: Grammatically and Semantically Correct Video Captioning
Asmar Nadeem
A. Hilton
R. Dawes
Graham A. Thomas
A. Mustafa
32
8
0
26 Mar 2023
GOAL: A Challenging Knowledge-grounded Video Captioning Benchmark for
  Real-time Soccer Commentary Generation
GOAL: A Challenging Knowledge-grounded Video Captioning Benchmark for Real-time Soccer Commentary Generation
Ji Qi
Jifan Yu
Teng Tu
Kunyu Gao
Yifan Xu
...
Juanzi Li
Jie Tang
Weidong Guo
Hui Liu
Yu-Syuan Xu
51
19
0
26 Mar 2023
VILA: Learning Image Aesthetics from User Comments with Vision-Language
  Pretraining
VILA: Learning Image Aesthetics from User Comments with Vision-Language Pretraining
Junjie Ke
Keren Ye
Jiahui Yu
Yonghui Wu
P. Milanfar
Feng Yang
VLM
57
58
0
24 Mar 2023
MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation
  Models
MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models
Dohwan Ko
Joon-Young Choi
Hyeong Kyu Choi
Kyoung-Woon On
Byungseok Roh
Hyunwoo J. Kim
59
20
0
23 Mar 2023
Text with Knowledge Graph Augmented Transformer for Video Captioning
Text with Knowledge Graph Augmented Transformer for Video Captioning
Xin Gu
G. Chen
Yufei Wang
Libo Zhang
Tiejian Luo
Longyin Wen
51
49
0
22 Mar 2023
Positive-Augmented Contrastive Learning for Image and Video Captioning
  Evaluation
Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation
Sara Sarto
Manuele Barraco
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
44
57
0
21 Mar 2023
VideoXum: Cross-modal Visual and Textural Summarization of Videos
VideoXum: Cross-modal Visual and Textural Summarization of Videos
Jingyang Lin
Hang Hua
Ming Chen
Yikang Li
Jenhao Hsiao
C. Ho
Jiebo Luo
38
31
0
21 Mar 2023
TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation
  with Question Answering
TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering
Yushi Hu
Benlin Liu
Jungo Kasai
Yizhong Wang
Mari Ostendorf
Ranjay Krishna
Noah A. Smith
EGVM
53
222
0
21 Mar 2023
Multi-modal reward for visual relationships-based image captioning
Multi-modal reward for visual relationships-based image captioning
Ali Abedi
Hossein Karshenas
Peyman Adibi
69
2
0
19 Mar 2023
SPDF: Sparse Pre-training and Dense Fine-tuning for Large Language
  Models
SPDF: Sparse Pre-training and Dense Fine-tuning for Large Language Models
Vithursan Thangarasa
Abhay Gupta
William Marshall
Tianda Li
Kevin Leong
D. DeCoste
Sean Lie
Shreyas Saxena
MoE
AI4CE
34
18
0
18 Mar 2023
Dynamic Graph Enhanced Contrastive Learning for Chest X-ray Report
  Generation
Dynamic Graph Enhanced Contrastive Learning for Chest X-ray Report Generation
Mingjie Li
Bingqian Lin
Zicong Chen
Haokun Lin
Xiaodan Liang
Xiaojun Chang
MedIm
33
109
0
18 Mar 2023
GNNFormer: A Graph-based Framework for Cytopathology Report Generation
GNNFormer: A Graph-based Framework for Cytopathology Report Generation
Yangqiaoyu Zhou
Kai-Lang Yao
Wusuo Li
MedIm
24
1
0
17 Mar 2023
CHAMPAGNE: Learning Real-world Conversation from Large-Scale Web Videos
CHAMPAGNE: Learning Real-world Conversation from Large-Scale Web Videos
Seungju Han
Jack Hessel
Nouha Dziri
Yejin Choi
Youngjae Yu
VGen
38
18
0
17 Mar 2023
Cross-Modal Causal Intervention for Medical Report Generation
Cross-Modal Causal Intervention for Medical Report Generation
Weixing Chen
Yang-Yang Liu
Ce Wang
Jiarui Zhu
Shen Zhao
Guanbin Li
Cheng-Lin Liu
39
7
0
16 Mar 2023
Previous
123...171819...424344
Next