ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1411.5726
  4. Cited By
CIDEr: Consensus-based Image Description Evaluation
v1v2 (latest)

CIDEr: Consensus-based Image Description Evaluation

20 November 2014
Ramakrishna Vedantam
C. L. Zitnick
Devi Parikh
ArXiv (abs)PDFHTML

Papers citing "CIDEr: Consensus-based Image Description Evaluation"

50 / 2,183 papers shown
Title
Towards a multimodal framework for remote sensing image change retrieval
  and captioning
Towards a multimodal framework for remote sensing image change retrieval and captioning
Roger Ferrod
Luigi Di Caro
Dino Ienco
52
2
0
19 Jun 2024
Enhancing Automated Audio Captioning via Large Language Models with
  Optimized Audio Encoding
Enhancing Automated Audio Captioning via Large Language Models with Optimized Audio Encoding
Jizhong Liu
Gang Li
Junbo Zhang
Heinrich Dinkel
Yongqing Wang
Zhiyong Yan
Yujun Wang
Bin Wang
AuLLM
135
5
0
19 Jun 2024
The Impact of Auxiliary Patient Data on Automated Chest X-Ray Report
  Generation and How to Incorporate It
The Impact of Auxiliary Patient Data on Automated Chest X-Ray Report Generation and How to Incorporate It
Aaron Nicolson
Shengyao Zhuang
Jason Dowling
Bevan Koopman
59
1
0
19 Jun 2024
RS-GPT4V: A Unified Multimodal Instruction-Following Dataset for Remote
  Sensing Image Understanding
RS-GPT4V: A Unified Multimodal Instruction-Following Dataset for Remote Sensing Image Understanding
Linrui Xu
Ling Zhao
Wang Guo
Qiujun Li
Kewang Long
Kaiqi Zou
Yuhan Wang
Haifeng Li
AI4TS
77
7
0
18 Jun 2024
VRSBench: A Versatile Vision-Language Benchmark Dataset for Remote
  Sensing Image Understanding
VRSBench: A Versatile Vision-Language Benchmark Dataset for Remote Sensing Image Understanding
Xiang Li
Jian Ding
Mohamed Elhoseiny
CoGe
75
33
0
18 Jun 2024
WildVision: Evaluating Vision-Language Models in the Wild with Human
  Preferences
WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences
Yujie Lu
Dongfu Jiang
Wenhu Chen
William Yang Wang
Yejin Choi
Bill Yuchen Lin
VLM
110
33
0
16 Jun 2024
Promoting Data and Model Privacy in Federated Learning through Quantized
  LoRA
Promoting Data and Model Privacy in Federated Learning through Quantized LoRA
Jianhao Zhu
Changze Lv
Xiaohua Wang
Muling Wu
Tianlong Li
Changze Lv
Zixuan Ling
Cenyuan Zhang
Xiaoqing Zheng
Xuanjing Huang
91
5
0
16 Jun 2024
Nymeria: A Massive Collection of Multimodal Egocentric Daily Motion in
  the Wild
Nymeria: A Massive Collection of Multimodal Egocentric Daily Motion in the Wild
Lingni Ma
Yuting Ye
Fangzhou Hong
Vladimir Guzov
Yifeng Jiang
...
C. Karen Liu
Ziwei Liu
Jakob Engel
R. D. Nardi
Richard Newcombe
94
25
0
14 Jun 2024
SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages
SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages
Holy Lovenia
Rahmad Mahendra
Salsabil Maulana Akbar
Lester James V. Miranda
Jennifer Santoso
...
Genta Indra Winata
Ruochen Zhang
Fajri Koto
Zheng-Xin Yong
Samuel Cahyawijaya
231
14
0
14 Jun 2024
mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus
mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus
Matthieu Futeral
A. Zebaze
Pedro Ortiz Suarez
Julien Abadji
Rémi Lacroix
Cordelia Schmid
Rachel Bawden
Benoît Sagot
169
3
0
13 Jun 2024
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images
  Interleaved with Text
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Qingyun Li
Zhe Chen
Weiyun Wang
Wenhai Wang
Shenglong Ye
...
Dahua Lin
Yu Qiao
Botian Shi
Conghui He
Jifeng Dai
VLMOffRL
119
27
0
12 Jun 2024
Tell Me What's Next: Textual Foresight for Generic UI Representations
Tell Me What's Next: Textual Foresight for Generic UI Representations
Andrea Burns
Kate Saenko
Bryan A. Plummer
LM&RoAI4TS
92
5
0
12 Jun 2024
ROADWork Dataset: Learning to Recognize, Observe, Analyze and Drive
  Through Work Zones
ROADWork Dataset: Learning to Recognize, Observe, Analyze and Drive Through Work Zones
Anurag Ghosh
R. Tamburo
Shen Zheng
Juan R. Alvarez-Padilla
Hailiang Zhu
Michael Cardei
Nicholas Dunn
Christoph Mertz
Srinivasa G. Narasimhan
93
1
0
11 Jun 2024
Situational Awareness Matters in 3D Vision Language Reasoning
Situational Awareness Matters in 3D Vision Language Reasoning
Yunze Man
Liang-Yan Gui
Yu-Xiong Wang
91
18
0
11 Jun 2024
Learning Domain-Invariant Features for Out-of-Context News Detection
Learning Domain-Invariant Features for Out-of-Context News Detection
Yimeng Gu
Mengqi Zhang
Ignacio Castro
Shu Wu
Gareth Tyson
98
2
0
11 Jun 2024
TRINS: Towards Multimodal Language Models that Can Read
TRINS: Towards Multimodal Language Models that Can Read
Ruiyi Zhang
Yanzhe Zhang
Jian Chen
Yufan Zhou
Jiuxiang Gu
Changyou Chen
Tong Sun
VLM
82
6
0
10 Jun 2024
Zero-Shot Audio Captioning Using Soft and Hard Prompts
Zero-Shot Audio Captioning Using Soft and Hard Prompts
Yiming Zhang
Xuenan Xu
Ruoyi Du
Haohe Liu
Yuan Dong
Zheng-Hua Tan
Wenwu Wang
Zhanyu Ma
VLM
77
4
0
10 Jun 2024
Vript: A Video Is Worth Thousands of Words
Vript: A Video Is Worth Thousands of Words
Dongjie Yang
Suyuan Huang
Chengqiang Lu
Xiaodong Han
Haoxin Zhang
Yan Gao
Yao Hu
Hai Zhao
VGen
147
31
0
10 Jun 2024
FLEUR: An Explainable Reference-Free Evaluation Metric for Image
  Captioning Using a Large Multimodal Model
FLEUR: An Explainable Reference-Free Evaluation Metric for Image Captioning Using a Large Multimodal Model
Yebin Lee
Imseong Park
Myungjoo Kang
75
18
0
10 Jun 2024
NarrativeBridge: Enhancing Video Captioning with Causal-Temporal Narrative
NarrativeBridge: Enhancing Video Captioning with Causal-Temporal Narrative
Asmar Nadeem
Faegheh Sardari
R. Dawes
Syed Sameed Husain
Adrian Hilton
Armin Mustafa
111
4
0
10 Jun 2024
Stealthy Targeted Backdoor Attacks against Image Captioning
Stealthy Targeted Backdoor Attacks against Image Captioning
Wenshu Fan
Hongwei Li
Wenbo Jiang
Meng Hao
Shui Yu
Xiao Zhang
DiffM
65
6
0
09 Jun 2024
One Perturbation is Enough: On Generating Universal Adversarial Perturbations against Vision-Language Pre-training Models
One Perturbation is Enough: On Generating Universal Adversarial Perturbations against Vision-Language Pre-training Models
Hao Fang
Jiawei Kong
Wenbo Yu
Bin Chen
Jiawei Li
Hao Wu
Ke Xu
Ke Xu
AAMLVLM
133
14
0
08 Jun 2024
Seeing the Unseen: Visual Metaphor Captioning for Videos
Seeing the Unseen: Visual Metaphor Captioning for Videos
Abisek Rajakumar Kalarani
Pushpak Bhattacharyya
Sumit Shekhar
VLM
71
1
0
07 Jun 2024
MGIMM: Multi-Granularity Instruction Multimodal Model for
  Attribute-Guided Remote Sensing Image Detailed Description
MGIMM: Multi-Granularity Instruction Multimodal Model for Attribute-Guided Remote Sensing Image Detailed Description
Cong Yang
Zuchao Li
Lefei Zhang
74
2
0
07 Jun 2024
DriVLMe: Enhancing LLM-based Autonomous Driving Agents with Embodied and
  Social Experiences
DriVLMe: Enhancing LLM-based Autonomous Driving Agents with Embodied and Social Experiences
Yidong Huang
Jacob Sansom
Ziqiao Ma
Felix Gervits
Joyce Chai
116
18
0
05 Jun 2024
Multi-layer Learnable Attention Mask for Multimodal Tasks
Multi-layer Learnable Attention Mask for Multimodal Tasks
Wayner Barrios
SouYoung Jin
71
1
0
04 Jun 2024
Discrete Multimodal Transformers with a Pretrained Large Language Model
  for Mixed-Supervision Speech Processing
Discrete Multimodal Transformers with a Pretrained Large Language Model for Mixed-Supervision Speech Processing
V. Trinh
Rosy Southwell
Yiwen Guan
Xinlu He
Zhiyong Wang
Jacob Whitehill
OffRL
93
2
0
04 Jun 2024
Translation Deserves Better: Analyzing Translation Artifacts in
  Cross-lingual Visual Question Answering
Translation Deserves Better: Analyzing Translation Artifacts in Cross-lingual Visual Question Answering
Yujin Baek
Koanho Lee
Hyesu Lim
Jaeseok Kim
Junmo Park
Yu-Jung Heo
Du-Seong Chang
Jaegul Choo
38
3
0
04 Jun 2024
Understanding Retrieval Robustness for Retrieval-Augmented Image
  Captioning
Understanding Retrieval Robustness for Retrieval-Augmented Image Captioning
Wenyan Li
Jiaang Li
R. Ramos
Raphael Tang
Desmond Elliott
VLM
121
3
0
04 Jun 2024
Diver: Large Language Model Decoding with Span-Level Mutual Information
  Verification
Diver: Large Language Model Decoding with Span-Level Mutual Information Verification
Jinliang Lu
Chen Wang
Jiajun Zhang
112
3
0
04 Jun 2024
Favi-Score: A Measure for Favoritism in Automated Preference Ratings for
  Generative AI Evaluation
Favi-Score: A Measure for Favoritism in Automated Preference Ratings for Generative AI Evaluation
Pius von Daniken
Jan Deriu
Don Tuggener
Mark Cieliebak
63
2
0
03 Jun 2024
OLIVE: Object Level In-Context Visual Embeddings
OLIVE: Object Level In-Context Visual Embeddings
Timothy Ossowski
Junjie Hu
OCLVLM
101
0
0
02 Jun 2024
Image Captioning via Dynamic Path Customization
Image Captioning via Dynamic Path Customization
Yiwei Ma
Jiayi Ji
Xiaoshuai Sun
Yiyi Zhou
Xiaopeng Hong
Yongjian Wu
Rongrong Ji
77
0
0
01 Jun 2024
Artemis: Towards Referential Understanding in Complex Videos
Artemis: Towards Referential Understanding in Complex Videos
Jihao Qiu
Yuan Zhang
Xi Tang
Lingxi Xie
Tianren Ma
Pengyu Yan
David Doermann
Qixiang Ye
Yunjie Tian
VLMVGen
85
10
0
01 Jun 2024
Are Large Vision Language Models up to the Challenge of Chart
  Comprehension and Reasoning? An Extensive Investigation into the Capabilities
  and Limitations of LVLMs
Are Large Vision Language Models up to the Challenge of Chart Comprehension and Reasoning? An Extensive Investigation into the Capabilities and Limitations of LVLMs
Mohammed Saidul Islam
Raian Rahman
Ahmed Masry
Md Tahmid Rahman Laskar
Mir Tafseer Nayeem
Enamul Hoque
LRMELM
63
4
0
01 Jun 2024
Context-aware Difference Distilling for Multi-change Captioning
Context-aware Difference Distilling for Multi-change Captioning
Yunbin Tu
Liang-Sheng Li
Li Su
Zheng-Jun Zha
Chenggang Yan
Qin Huang
83
10
0
31 May 2024
Benchmarking and Improving Detail Image Caption
Benchmarking and Improving Detail Image Caption
Hongyuan Dong
Jiawen Li
Bohong Wu
Jiacong Wang
Yuan Zhang
Haoyuan Guo
VLMMLLM
103
31
0
29 May 2024
MindSemantix: Deciphering Brain Visual Experiences with a Brain-Language
  Model
MindSemantix: Deciphering Brain Visual Experiences with a Brain-Language Model
Ziqi Ren
Jie Li
Xuetong Xue
Xin Li
Fan Yang
Zhicheng Jiao
Xinbo Gao
97
3
0
29 May 2024
MetaToken: Detecting Hallucination in Image Descriptions by Meta Classification
MetaToken: Detecting Hallucination in Image Descriptions by Meta Classification
Laura Fieback
Jakob Spiegelberg
Hanno Gottschalk
MLLM
232
5
0
29 May 2024
Recent Trends in Personalized Dialogue Generation: A Review of Datasets,
  Methodologies, and Evaluations
Recent Trends in Personalized Dialogue Generation: A Review of Datasets, Methodologies, and Evaluations
Yi-Pei Chen
Noriki Nishida
Hideki Nakayama
Yuji Matsumoto
LLMAG
93
15
0
28 May 2024
Seeing the Image: Prioritizing Visual Correlation by Contrastive
  Alignment
Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment
Xin Xiao
Bohong Wu
Jiacong Wang
Chunyuan Li
Xun Zhou
Haoyuan Guo
VLM
73
9
0
28 May 2024
UIT-DarkCow team at ImageCLEFmedical Caption 2024: Diagnostic Captioning
  for Radiology Images Efficiency with Transformer Models
UIT-DarkCow team at ImageCLEFmedical Caption 2024: Diagnostic Captioning for Radiology Images Efficiency with Transformer Models
Quan Van Nguyen
Huy Quang Pham
Dan Quang Tran
Thang Kien-Bao Nguyen
Nhat-Hao Nguyen-Dang
Bao-Thien Nguyen-Tat
MedIm
67
2
0
27 May 2024
Think Before You Act: A Two-Stage Framework for Mitigating Gender Bias
  Towards Vision-Language Tasks
Think Before You Act: A Two-Stage Framework for Mitigating Gender Bias Towards Vision-Language Tasks
Yunqi Zhang
Songda Li
Chunyuan Deng
Luyi Wang
Hui Zhao
121
0
0
27 May 2024
VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models
VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models
Zejun Li
Ruipu Luo
Jiwen Zhang
Minghui Qiu
Zhongyu Wei
Zhongyu Wei
LRMMLLM
185
17
0
27 May 2024
Reason3D: Searching and Reasoning 3D Segmentation via Large Language Model
Reason3D: Searching and Reasoning 3D Segmentation via Large Language Model
Kuan-Chih Huang
Xiangtai Li
Lu Qi
Shuicheng Yan
Ming-Hsuan Yang
LRM
175
12
0
27 May 2024
M$^3$GPT: An Advanced Multimodal, Multitask Framework for Motion
  Comprehension and Generation
M3^33GPT: An Advanced Multimodal, Multitask Framework for Motion Comprehension and Generation
Mingshuang Luo
Ruibing Hou
Hong Chang
Zimo Liu
Yaowei Wang
Shiguang Shan
85
10
0
25 May 2024
Text Generation: A Systematic Literature Review of Tasks, Evaluation,
  and Challenges
Text Generation: A Systematic Literature Review of Tasks, Evaluation, and Challenges
Jonas Becker
Jan Philip Wahle
Bela Gipp
Terry Ruas
122
11
0
24 May 2024
DEEM: Diffusion Models Serve as the Eyes of Large Language Models for Image Perception
DEEM: Diffusion Models Serve as the Eyes of Large Language Models for Image Perception
Run Luo
Yunshui Li
Longze Chen
Wanwei He
Ting-En Lin
...
Zikai Song
Xiaobo Xia
Tongliang Liu
Min Yang
Binyuan Hui
VLMDiffM
188
22
0
24 May 2024
EMR-Merging: Tuning-Free High-Performance Model Merging
EMR-Merging: Tuning-Free High-Performance Model Merging
Chenyu Huang
Peng Ye
Tao Chen
Tong He
Xiangyu Yue
Wanli Ouyang
MoMe
89
45
0
23 May 2024
Multi-modality Regional Alignment Network for Covid X-Ray Survival
  Prediction and Report Generation
Multi-modality Regional Alignment Network for Covid X-Ray Survival Prediction and Report Generation
Zhusi Zhong
Jie Li
J. Sollee
Scott Collins
Harrison X. Bai
Paul J Zhang
Terrance Healey
Michael Atalay
Xinbo Gao
Zhicheng Jiao
66
1
0
23 May 2024
Previous
123...789...424344
Next