ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1411.5726
  4. Cited By
CIDEr: Consensus-based Image Description Evaluation

CIDEr: Consensus-based Image Description Evaluation

20 November 2014
Ramakrishna Vedantam
C. L. Zitnick
Devi Parikh
ArXivPDFHTML

Papers citing "CIDEr: Consensus-based Image Description Evaluation"

50 / 2,137 papers shown
Title
X-ray Made Simple: Lay Radiology Report Generation and Robust Evaluation
X-ray Made Simple: Lay Radiology Report Generation and Robust Evaluation
Kun Zhao
Chenghao Xiao
Chen Tang
Bohao Yang
Kai Ye
Noura Al Moubayed
Liang Zhan
Chenghua Lin
53
0
0
25 Jun 2024
UBiSS: A Unified Framework for Bimodal Semantic Summarization of Videos
UBiSS: A Unified Framework for Bimodal Semantic Summarization of Videos
Yuting Mei
Linli Yao
Qin Jin
42
1
0
24 Jun 2024
Does Object Grounding Really Reduce Hallucination of Large
  Vision-Language Models?
Does Object Grounding Really Reduce Hallucination of Large Vision-Language Models?
Gregor Geigle
Radu Timofte
Goran Glavas
43
0
0
20 Jun 2024
From Descriptive Richness to Bias: Unveiling the Dark Side of Generative
  Image Caption Enrichment
From Descriptive Richness to Bias: Unveiling the Dark Side of Generative Image Caption Enrichment
Yusuke Hirota
Ryo Hachiuma
Chao-Han Huck Yang
Yuta Nakashima
VLM
43
4
0
20 Jun 2024
Adaptable Logical Control for Large Language Models
Adaptable Logical Control for Large Language Models
Honghua Zhang
Po-Nien Kung
Masahiro Yoshida
Mathias Niepert
Nanyun Peng
42
8
0
19 Jun 2024
StableSemantics: A Synthetic Language-Vision Dataset of Semantic
  Representations in Naturalistic Images
StableSemantics: A Synthetic Language-Vision Dataset of Semantic Representations in Naturalistic Images
Rushikesh Zawar
Shaurya Dewan
Andrew F. Luo
Margaret M. Henderson
Michael J. Tarr
Leila Wehbe
VGen
CoGe
44
1
0
19 Jun 2024
Towards a multimodal framework for remote sensing image change retrieval
  and captioning
Towards a multimodal framework for remote sensing image change retrieval and captioning
Roger Ferrod
Luigi Di Caro
Dino Ienco
24
2
0
19 Jun 2024
Enhancing Automated Audio Captioning via Large Language Models with
  Optimized Audio Encoding
Enhancing Automated Audio Captioning via Large Language Models with Optimized Audio Encoding
Jizhong Liu
Gang Li
Junbo Zhang
Heinrich Dinkel
Yongqing Wang
Zhiyong Yan
Yujun Wang
Bin Wang
AuLLM
57
2
0
19 Jun 2024
The Impact of Auxiliary Patient Data on Automated Chest X-Ray Report
  Generation and How to Incorporate It
The Impact of Auxiliary Patient Data on Automated Chest X-Ray Report Generation and How to Incorporate It
Aaron Nicolson
Shengyao Zhuang
Jason Dowling
Bevan Koopman
34
1
0
19 Jun 2024
RS-GPT4V: A Unified Multimodal Instruction-Following Dataset for Remote
  Sensing Image Understanding
RS-GPT4V: A Unified Multimodal Instruction-Following Dataset for Remote Sensing Image Understanding
Linrui Xu
Ling Zhao
Wang Guo
Qiujun Li
Kewang Long
Kaiqi Zou
Yuhan Wang
Haifeng Li
AI4TS
33
7
0
18 Jun 2024
VRSBench: A Versatile Vision-Language Benchmark Dataset for Remote
  Sensing Image Understanding
VRSBench: A Versatile Vision-Language Benchmark Dataset for Remote Sensing Image Understanding
Xiang Li
Jian Ding
Mohamed Elhoseiny
CoGe
37
21
0
18 Jun 2024
WildVision: Evaluating Vision-Language Models in the Wild with Human
  Preferences
WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences
Yujie Lu
Dongfu Jiang
Wenhu Chen
William Yang Wang
Yejin Choi
Bill Yuchen Lin
VLM
51
26
0
16 Jun 2024
Promoting Data and Model Privacy in Federated Learning through Quantized
  LoRA
Promoting Data and Model Privacy in Federated Learning through Quantized LoRA
Jianhao Zhu
Changze Lv
Xiaohua Wang
Muling Wu
Wenhao Liu
Tianlong Li
Zixuan Ling
Cenyuan Zhang
Xiaoqing Zheng
Xuanjing Huang
44
3
0
16 Jun 2024
Nymeria: A Massive Collection of Multimodal Egocentric Daily Motion in
  the Wild
Nymeria: A Massive Collection of Multimodal Egocentric Daily Motion in the Wild
Lingni Ma
Yuting Ye
Fangzhou Hong
Vladimir Guzov
Yifeng Jiang
...
C. Karen Liu
Ziwei Liu
Jakob Engel
R. D. Nardi
Richard Newcombe
32
21
0
14 Jun 2024
SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages
SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages
Holy Lovenia
Rahmad Mahendra
Salsabil Maulana Akbar
Lester James V. Miranda
Jennifer Santoso
...
Genta Indra Winata
Ruochen Zhang
Fajri Koto
Zheng-Xin Yong
Samuel Cahyawijaya
95
9
0
14 Jun 2024
mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus
mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus
Matthieu Futeral
A. Zebaze
Pedro Ortiz Suarez
Julien Abadji
Rémi Lacroix
Cordelia Schmid
Rachel Bawden
Benoît Sagot
39
3
0
13 Jun 2024
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images
  Interleaved with Text
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Qingyun Li
Zhe Chen
Weiyun Wang
Wenhai Wang
Shenglong Ye
...
Dahua Lin
Yu Qiao
Botian Shi
Conghui He
Jifeng Dai
VLM
OffRL
56
21
0
12 Jun 2024
Tell Me What's Next: Textual Foresight for Generic UI Representations
Tell Me What's Next: Textual Foresight for Generic UI Representations
Andrea Burns
Kate Saenko
Bryan A. Plummer
LM&Ro
AI4TS
36
4
0
12 Jun 2024
ROADWork Dataset: Learning to Recognize, Observe, Analyze and Drive
  Through Work Zones
ROADWork Dataset: Learning to Recognize, Observe, Analyze and Drive Through Work Zones
Anurag Ghosh
R. Tamburo
Shen Zheng
Juan R. Alvarez-Padilla
Hailiang Zhu
Michael Cardei
Nicholas Dunn
Christoph Mertz
Srinivasa G. Narasimhan
49
1
0
11 Jun 2024
Situational Awareness Matters in 3D Vision Language Reasoning
Situational Awareness Matters in 3D Vision Language Reasoning
Yunze Man
Liang-Yan Gui
Yu-Xiong Wang
43
12
0
11 Jun 2024
Learning Domain-Invariant Features for Out-of-Context News Detection
Learning Domain-Invariant Features for Out-of-Context News Detection
Yimeng Gu
Mengqi Zhang
Ignacio Castro
Shu Wu
Gareth Tyson
45
2
0
11 Jun 2024
TRINS: Towards Multimodal Language Models that Can Read
TRINS: Towards Multimodal Language Models that Can Read
Ruiyi Zhang
Yanzhe Zhang
Jian Chen
Yufan Zhou
Jiuxiang Gu
Changyou Chen
Tong Sun
VLM
39
6
0
10 Jun 2024
Zero-Shot Audio Captioning Using Soft and Hard Prompts
Zero-Shot Audio Captioning Using Soft and Hard Prompts
Yiming Zhang
Xuenan Xu
Ruoyi Du
Haohe Liu
Yuan Dong
Zheng-Hua Tan
Wenwu Wang
Zhanyu Ma
VLM
35
4
0
10 Jun 2024
Vript: A Video Is Worth Thousands of Words
Vript: A Video Is Worth Thousands of Words
Dongjie Yang
Suyuan Huang
Chengqiang Lu
Xiaodong Han
Haoxin Zhang
Yan Gao
Yao Hu
Hai Zhao
VGen
80
22
0
10 Jun 2024
FLEUR: An Explainable Reference-Free Evaluation Metric for Image
  Captioning Using a Large Multimodal Model
FLEUR: An Explainable Reference-Free Evaluation Metric for Image Captioning Using a Large Multimodal Model
Yebin Lee
Imseong Park
Myungjoo Kang
32
11
0
10 Jun 2024
NarrativeBridge: Enhancing Video Captioning with Causal-Temporal Narrative
NarrativeBridge: Enhancing Video Captioning with Causal-Temporal Narrative
Asmar Nadeem
Faegheh Sardari
R. Dawes
Syed Sameed Husain
Adrian Hilton
Armin Mustafa
57
4
0
10 Jun 2024
Stealthy Targeted Backdoor Attacks against Image Captioning
Stealthy Targeted Backdoor Attacks against Image Captioning
Wenshu Fan
Hongwei Li
Wenbo Jiang
Meng Hao
Shui Yu
Xiao Zhang
DiffM
27
6
0
09 Jun 2024
One Perturbation is Enough: On Generating Universal Adversarial Perturbations against Vision-Language Pre-training Models
One Perturbation is Enough: On Generating Universal Adversarial Perturbations against Vision-Language Pre-training Models
Hao Fang
Jiawei Kong
Wenbo Yu
Bin Chen
Jiawei Li
Hao Wu
Ke Xu
Ke Xu
AAML
VLM
40
13
0
08 Jun 2024
Seeing the Unseen: Visual Metaphor Captioning for Videos
Seeing the Unseen: Visual Metaphor Captioning for Videos
Abisek Rajakumar Kalarani
Pushpak Bhattacharyya
Sumit Shekhar
VLM
32
1
0
07 Jun 2024
MGIMM: Multi-Granularity Instruction Multimodal Model for
  Attribute-Guided Remote Sensing Image Detailed Description
MGIMM: Multi-Granularity Instruction Multimodal Model for Attribute-Guided Remote Sensing Image Detailed Description
Cong Yang
Zuchao Li
Lefei Zhang
52
1
0
07 Jun 2024
DriVLMe: Enhancing LLM-based Autonomous Driving Agents with Embodied and
  Social Experiences
DriVLMe: Enhancing LLM-based Autonomous Driving Agents with Embodied and Social Experiences
Yidong Huang
Jacob Sansom
Ziqiao Ma
Felix Gervits
Joyce Chai
44
17
0
05 Jun 2024
Multi-layer Learnable Attention Mask for Multimodal Tasks
Multi-layer Learnable Attention Mask for Multimodal Tasks
Wayner Barrios
SouYoung Jin
39
0
0
04 Jun 2024
Discrete Multimodal Transformers with a Pretrained Large Language Model
  for Mixed-Supervision Speech Processing
Discrete Multimodal Transformers with a Pretrained Large Language Model for Mixed-Supervision Speech Processing
V. Trinh
Rosy Southwell
Yiwen Guan
Xinlu He
Zhiyong Wang
Jacob Whitehill
OffRL
36
2
0
04 Jun 2024
Understanding Retrieval Robustness for Retrieval-Augmented Image
  Captioning
Understanding Retrieval Robustness for Retrieval-Augmented Image Captioning
Wenyan Li
Jiaang Li
R. Ramos
Raphael Tang
Desmond Elliott
VLM
41
3
0
04 Jun 2024
Diver: Large Language Model Decoding with Span-Level Mutual Information
  Verification
Diver: Large Language Model Decoding with Span-Level Mutual Information Verification
Jinliang Lu
Chen Wang
Jiajun Zhang
57
3
0
04 Jun 2024
Favi-Score: A Measure for Favoritism in Automated Preference Ratings for
  Generative AI Evaluation
Favi-Score: A Measure for Favoritism in Automated Preference Ratings for Generative AI Evaluation
Pius von Daniken
Jan Deriu
Don Tuggener
Mark Cieliebak
31
1
0
03 Jun 2024
OLIVE: Object Level In-Context Visual Embeddings
OLIVE: Object Level In-Context Visual Embeddings
Timothy Ossowski
Junjie Hu
OCL
VLM
57
0
0
02 Jun 2024
Image Captioning via Dynamic Path Customization
Image Captioning via Dynamic Path Customization
Yiwei Ma
Jiayi Ji
Xiaoshuai Sun
Yiyi Zhou
Xiaopeng Hong
Yongjian Wu
Rongrong Ji
34
0
0
01 Jun 2024
Artemis: Towards Referential Understanding in Complex Videos
Artemis: Towards Referential Understanding in Complex Videos
Jihao Qiu
Yuan Zhang
Xi Tang
Lingxi Xie
Tianren Ma
Pengyu Yan
David Doermann
Qixiang Ye
Yunjie Tian
VLM
VGen
46
8
0
01 Jun 2024
Are Large Vision Language Models up to the Challenge of Chart
  Comprehension and Reasoning? An Extensive Investigation into the Capabilities
  and Limitations of LVLMs
Are Large Vision Language Models up to the Challenge of Chart Comprehension and Reasoning? An Extensive Investigation into the Capabilities and Limitations of LVLMs
Mohammed Saidul Islam
Raian Rahman
Ahmed Masry
Md Tahmid Rahman Laskar
Mir Tafseer Nayeem
Enamul Hoque
LRM
ELM
38
4
0
01 Jun 2024
Context-aware Difference Distilling for Multi-change Captioning
Context-aware Difference Distilling for Multi-change Captioning
Yunbin Tu
Liang-Sheng Li
Li Su
Zheng-Jun Zha
Chenggang Yan
Qin Huang
47
7
0
31 May 2024
Benchmarking and Improving Detail Image Caption
Benchmarking and Improving Detail Image Caption
Hongyuan Dong
Jiawen Li
Bohong Wu
Jiacong Wang
Yuan Zhang
Haoyuan Guo
VLM
MLLM
35
16
0
29 May 2024
MindSemantix: Deciphering Brain Visual Experiences with a Brain-Language
  Model
MindSemantix: Deciphering Brain Visual Experiences with a Brain-Language Model
Ziqi Ren
Jie Li
Xuetong Xue
Xin Li
Fan Yang
Zhicheng Jiao
Xinbo Gao
46
3
0
29 May 2024
MetaToken: Detecting Hallucination in Image Descriptions by Meta Classification
MetaToken: Detecting Hallucination in Image Descriptions by Meta Classification
Laura Fieback
Jakob Spiegelberg
Hanno Gottschalk
MLLM
65
5
0
29 May 2024
Recent Trends in Personalized Dialogue Generation: A Review of Datasets,
  Methodologies, and Evaluations
Recent Trends in Personalized Dialogue Generation: A Review of Datasets, Methodologies, and Evaluations
Yi-Pei Chen
Noriki Nishida
Hideki Nakayama
Yuji Matsumoto
LLMAG
55
11
0
28 May 2024
Seeing the Image: Prioritizing Visual Correlation by Contrastive
  Alignment
Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment
Xin Xiao
Bohong Wu
Jiacong Wang
Chunyuan Li
Xun Zhou
Haoyuan Guo
VLM
39
7
0
28 May 2024
UIT-DarkCow team at ImageCLEFmedical Caption 2024: Diagnostic Captioning
  for Radiology Images Efficiency with Transformer Models
UIT-DarkCow team at ImageCLEFmedical Caption 2024: Diagnostic Captioning for Radiology Images Efficiency with Transformer Models
Quan Van Nguyen
Huy Quang Pham
Dan Quang Tran
Thang Kien-Bao Nguyen
Nhat-Hao Nguyen-Dang
Bao-Thien Nguyen-Tat
MedIm
34
1
0
27 May 2024
Think Before You Act: A Two-Stage Framework for Mitigating Gender Bias
  Towards Vision-Language Tasks
Think Before You Act: A Two-Stage Framework for Mitigating Gender Bias Towards Vision-Language Tasks
Yunqi Zhang
Songda Li
Chunyuan Deng
Luyi Wang
Hui Zhao
34
0
0
27 May 2024
VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models
VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models
Zejun Li
Ruipu Luo
Jiwen Zhang
Minghui Qiu
Zhongyu Wei
Zhongyu Wei
LRM
MLLM
62
8
0
27 May 2024
Reason3D: Searching and Reasoning 3D Segmentation via Large Language Model
Reason3D: Searching and Reasoning 3D Segmentation via Large Language Model
Kuan-Chih Huang
Xiangtai Li
Lu Qi
Shuicheng Yan
Ming-Hsuan Yang
LRM
76
10
0
27 May 2024
Previous
123...678...414243
Next