ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1411.5726
  4. Cited By
CIDEr: Consensus-based Image Description Evaluation
v1v2 (latest)

CIDEr: Consensus-based Image Description Evaluation

20 November 2014
Ramakrishna Vedantam
C. L. Zitnick
Devi Parikh
ArXiv (abs)PDFHTML

Papers citing "CIDEr: Consensus-based Image Description Evaluation"

50 / 2,183 papers shown
Title
Img-Diff: Contrastive Data Synthesis for Multimodal Large Language
  Models
Img-Diff: Contrastive Data Synthesis for Multimodal Large Language Models
Qirui Jiao
Daoyuan Chen
Yilun Huang
Yaliang Li
Ying Shen
VLM
113
8
0
08 Aug 2024
UNMuTe: Unifying Navigation and Multimodal Dialogue-like Text Generation
UNMuTe: Unifying Navigation and Multimodal Dialogue-like Text Generation
Niyati Rawal
Roberto Bigazzi
Lorenzo Baraldi
Rita Cucchiara
LM&Ro
99
1
0
08 Aug 2024
Dual-path Collaborative Generation Network for Emotional Video
  Captioning
Dual-path Collaborative Generation Network for Emotional Video Captioning
Cheng Ye
Weidong Chen
Jingyu Li
Li Zhang
Zhendong Mao
126
1
0
06 Aug 2024
Multitask and Multimodal Neural Tuning for Large Models
Multitask and Multimodal Neural Tuning for Large Models
Hao Sun
Yu Song
Jihong Hu
Yen-Wei Chen
Lanfen Lin
VLM
44
0
0
06 Aug 2024
GazeXplain: Learning to Predict Natural Language Explanations of Visual
  Scanpaths
GazeXplain: Learning to Predict Natural Language Explanations of Visual Scanpaths
Xianyu Chen
Ming Jiang
Qi Zhao
69
3
0
05 Aug 2024
COM Kitchens: An Unedited Overhead-view Video Dataset as a
  Vision-Language Benchmark
COM Kitchens: An Unedited Overhead-view Video Dataset as a Vision-Language Benchmark
Koki Maeda
Tosho Hirasawa
Atsushi Hashimoto
Jun Harashima
Leszek Rybicki
Yusuke Fukasawa
Yoshitaka Ushiku
101
0
0
05 Aug 2024
A Novel Evaluation Framework for Image2Text Generation
A Novel Evaluation Framework for Image2Text Generation
Jia-Hong Huang
Hongyi Zhu
Yixian Shen
Stevan Rudinac
A. M. Pacces
Evangelos Kanoulas
75
9
0
03 Aug 2024
Learning Video Context as Interleaved Multimodal Sequences
Learning Video Context as Interleaved Multimodal Sequences
S. Shao
Pengchuan Zhang
Y. Li
Xide Xia
A. Meso
Ziteng Gao
Jinheng Xie
N. Holliman
Mike Zheng Shou
108
6
0
31 Jul 2024
ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large
  Language Models
ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models
Ming-Kuan Wu
Xinyue Cai
Jiayi Ji
Jiale Li
Oucheng Huang
Gen Luo
Hao Fei
Xiaoshuai Sun
Rongrong Ji
MLLM
158
13
0
31 Jul 2024
MMTrail: A Multimodal Trailer Video Dataset with Language and Music
  Descriptions
MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions
Xiaowei Chi
Yatian Wang
Aosong Cheng
Pengjun Fang
Zeyue Tian
...
Wenhan Luo
Qifeng Chen
Shanghang Zhang
Qi-fei Liu
Yi-Ting Guo
135
7
0
30 Jul 2024
Effectively Leveraging CLIP for Generating Situational Summaries of Images and Videos
Effectively Leveraging CLIP for Generating Situational Summaries of Images and Videos
Dhruv Verma
Debaditya Roy
Basura Fernando
86
1
0
30 Jul 2024
BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger
  Visual Cues
BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger Visual Cues
Sara Sarto
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
75
7
0
29 Jul 2024
A Labeled Ophthalmic Ultrasound Dataset with Medical Report Generation
  Based on Cross-modal Deep Learning
A Labeled Ophthalmic Ultrasound Dataset with Medical Report Generation Based on Cross-modal Deep Learning
Jing Wang
Junyan Fan
Meng Zhou
Yanzhu Zhang
Mingyu Shi
38
1
0
26 Jul 2024
HICEScore: A Hierarchical Metric for Image Captioning Evaluation
HICEScore: A Hierarchical Metric for Image Captioning Evaluation
Zequn Zeng
Jianqiao Sun
Hao Zhang
Tiansheng Wen
Yudi Su
Yan Xie
Zhengjue Wang
Boli Chen
101
3
0
26 Jul 2024
AutoAD-Zero: A Training-Free Framework for Zero-Shot Audio Description
AutoAD-Zero: A Training-Free Framework for Zero-Shot Audio Description
Junyu Xie
Tengda Han
Max Bain
Arsha Nagrani
Gül Varol
Weidi Xie
Andrew Zisserman
VGen
74
10
0
22 Jul 2024
HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal
  Reasoning
HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal Reasoning
Zhecan Wang
Garrett Bingham
Adams Wei Yu
Quoc V. Le
Thang Luong
Golnaz Ghiasi
MLLMLRM
137
13
0
22 Jul 2024
Navigation Instruction Generation with BEV Perception and Large Language
  Models
Navigation Instruction Generation with BEV Perception and Large Language Models
Sheng Fan
Rui Liu
Wenguan Wang
Yi Yang
94
9
0
21 Jul 2024
Sim-CLIP: Unsupervised Siamese Adversarial Fine-Tuning for Robust and
  Semantically-Rich Vision-Language Models
Sim-CLIP: Unsupervised Siamese Adversarial Fine-Tuning for Robust and Semantically-Rich Vision-Language Models
Md Zarif Hossain
Ahmed Imteaj
VLMAAML
66
6
0
20 Jul 2024
Downstream-Pretext Domain Knowledge Traceback for Active Learning
Downstream-Pretext Domain Knowledge Traceback for Active Learning
Beichen Zhang
Liang-Sheng Li
Zheng-Jun Zha
Jiebo Luo
Qingming Huang
72
0
0
20 Jul 2024
Semantic-CC: Boosting Remote Sensing Image Change Captioning via
  Foundational Knowledge and Semantic Guidance
Semantic-CC: Boosting Remote Sensing Image Change Captioning via Foundational Knowledge and Semantic Guidance
Yongshuo Zhu
Lu Li
Keyan Chen
Chenyang Liu
Fugen Zhou
Z. Shi
78
4
0
19 Jul 2024
NeuroBind: Towards Unified Multimodal Representations for Neural Signals
NeuroBind: Towards Unified Multimodal Representations for Neural Signals
Fengyu Yang
Chao Feng
Daniel Wang
Tianye Wang
Ziyao Zeng
...
Hyoungseob Park
Pengliang Ji
Han Zhao
Yuanning Li
Alex Wong
113
10
0
19 Jul 2024
Nearest Neighbor Future Captioning: Generating Descriptions for Possible
  Collisions in Object Placement Tasks
Nearest Neighbor Future Captioning: Generating Descriptions for Possible Collisions in Object Placement Tasks
Takumi Komatsu
Motonari Kambara
Shumpei Hatanaka
Haruka Matsuo
Tsubasa Hirakawa
Takayoshi Yamashita
H. Fujiyoshi
Komei Sugiura
66
0
0
18 Jul 2024
MoME: Mixture of Multimodal Experts for Generalist Multimodal Large
  Language Models
MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models
Leyang Shen
Gongwei Chen
Rui Shao
Weili Guan
Liqiang Nie
MoE
81
12
0
17 Jul 2024
ModalChorus: Visual Probing and Alignment of Multi-modal Embeddings via
  Modal Fusion Map
ModalChorus: Visual Probing and Alignment of Multi-modal Embeddings via Modal Fusion Map
Yilin Ye
Shishi Xiao
Xingchen Zeng
Wei Zeng
114
5
0
17 Jul 2024
Distractors-Immune Representation Learning with Cross-modal Contrastive
  Regularization for Change Captioning
Distractors-Immune Representation Learning with Cross-modal Contrastive Regularization for Change Captioning
Yunbin Tu
Liang-Sheng Li
Li Su
Chenggang Yan
Qin Huang
102
7
0
16 Jul 2024
FIRE: A Dataset for Feedback Integration and Refinement Evaluation of
  Multimodal Models
FIRE: A Dataset for Feedback Integration and Refinement Evaluation of Multimodal Models
Pengxiang Li
Zhi Gao
Bofei Zhang
Tao Yuan
Yuwei Wu
Mehrtash Harandi
Yunde Jia
Song-Chun Zhu
Qing Li
VLMMLLM
102
6
0
16 Jul 2024
Controllable Contextualized Image Captioning: Directing the Visual
  Narrative through User-Defined Highlights
Controllable Contextualized Image Captioning: Directing the Visual Narrative through User-Defined Highlights
Shunqi Mao
Chaoyi Zhang
Hang Su
Hwanjun Song
Igor Shalyminov
Weidong Cai
72
1
0
16 Jul 2024
SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers
SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers
Shraman Pramanick
Rama Chellappa
Subhashini Venugopalan
111
21
0
12 Jul 2024
Controllable Navigation Instruction Generation with Chain of Thought
  Prompting
Controllable Navigation Instruction Generation with Chain of Thought Prompting
Xianghao Kong
Jinyu Chen
Wenguan Wang
Hang Su
Xiaolin Hu
Yi Yang
Si Liu
LRM
105
9
0
10 Jul 2024
LEMoN: Label Error Detection using Multimodal Neighbors
LEMoN: Label Error Detection using Multimodal Neighbors
Haoran Zhang
Aparna Balagopalan
Nassim Oufattole
Hyewon Jeong
Yan Wu
Jiacheng Zhu
Marzyeh Ghassemi
128
0
0
10 Jul 2024
Vision-Language Models under Cultural and Inclusive Considerations
Vision-Language Models under Cultural and Inclusive Considerations
Antonia Karamolegkou
Phillip Rust
Yong Cao
Ruixiang Cui
Anders Søgaard
Daniel Hershcovich
VLM
117
8
0
08 Jul 2024
OneDiff: A Generalist Model for Image Difference Captioning
OneDiff: A Generalist Model for Image Difference Captioning
Erdong Hu
Longteng Guo
Tongtian Yue
Zijia Zhao
Shuning Xue
Jing Liu
VLM
121
2
0
08 Jul 2024
Ask Questions with Double Hints: Visual Question Generation with
  Answer-awareness and Region-reference
Ask Questions with Double Hints: Visual Question Generation with Answer-awareness and Region-reference
Kai Shen
Lingfei Wu
Siliang Tang
Fangli Xu
Bo Long
Yueting Zhuang
Jian Pei
70
0
0
06 Jul 2024
Not (yet) the whole story: Evaluating Visual Storytelling Requires More
  than Measuring Coherence, Grounding, and Repetition
Not (yet) the whole story: Evaluating Visual Storytelling Requires More than Measuring Coherence, Grounding, and Repetition
Aditya K Surikuchi
Raquel Fernández
Sandro Pezzelle
61
6
0
05 Jul 2024
M5 -- A Diverse Benchmark to Assess the Performance of Large Multimodal
  Models Across Multilingual and Multicultural Vision-Language Tasks
M5 -- A Diverse Benchmark to Assess the Performance of Large Multimodal Models Across Multilingual and Multicultural Vision-Language Tasks
Florian Schneider
Sunayana Sitaram
VLM
81
12
0
04 Jul 2024
Multi-Modal Video Dialog State Tracking in the Wild
Multi-Modal Video Dialog State Tracking in the Wild
Adnen Abdessaied
Lei Shi
Andreas Bulling
59
2
0
02 Jul 2024
Extracting and Encoding: Leveraging Large Language Models and Medical
  Knowledge to Enhance Radiological Text Representation
Extracting and Encoding: Leveraging Large Language Models and Medical Knowledge to Enhance Radiological Text Representation
Pablo Messina
René Vidal
Denis Parra
Álvaro Soto
Vladimir Araujo
MedIm
113
4
0
02 Jul 2024
A Bounding Box is Worth One Token: Interleaving Layout and Text in a Large Language Model for Document Understanding
A Bounding Box is Worth One Token: Interleaving Layout and Text in a Large Language Model for Document Understanding
Jinghui Lu
Haiyang Yu
Yanjie Wang
Yongjie Ye
Jingqun Tang
...
Qi Liu
Hao Feng
Han Wang
Hao Liu
Can Huang
178
23
0
02 Jul 2024
Meerkat: Audio-Visual Large Language Model for Grounding in Space and
  Time
Meerkat: Audio-Visual Large Language Model for Grounding in Space and Time
Sanjoy Chowdhury
Sayan Nag
Subhrajyoti Dasgupta
Jun Chen
Mohamed Elhoseiny
Ruohan Gao
Dinesh Manocha
VLMMLLM
98
15
0
01 Jul 2024
From Introspection to Best Practices: Principled Analysis of Demonstrations in Multimodal In-Context Learning
From Introspection to Best Practices: Principled Analysis of Demonstrations in Multimodal In-Context Learning
Nan Xu
Fei Wang
Sheng Zhang
Hoifung Poon
Muhao Chen
139
7
0
01 Jul 2024
Tarsier: Recipes for Training and Evaluating Large Video Description
  Models
Tarsier: Recipes for Training and Evaluating Large Video Description Models
Jiawei Wang
Liping Yuan
Yuchen Zhang
108
67
0
30 Jun 2024
DocKylin: A Large Multimodal Model for Visual Document Understanding
  with Efficient Visual Slimming
DocKylin: A Large Multimodal Model for Visual Document Understanding with Efficient Visual Slimming
Jiaxin Zhang
Wentao Yang
Songxuan Lai
Zecheng Xie
Lianwen Jin
94
21
0
27 Jun 2024
MatchTime: Towards Automatic Soccer Game Commentary Generation
MatchTime: Towards Automatic Soccer Game Commentary Generation
Jiayuan Rao
Haoning Wu
Chang-rui Liu
Yanfeng Wang
Weidi Xie
88
8
0
26 Jun 2024
GUIDE: A Guideline-Guided Dataset for Instructional Video Comprehension
GUIDE: A Guideline-Guided Dataset for Instructional Video Comprehension
Jiafeng Liang
Shixin Jiang
Zekun Wang
Haojie Pan
Zerui Chen
Zheng Chu
Ming Liu
Ruiji Fu
Zhongyuan Wang
Bing Qin
69
3
0
26 Jun 2024
X-ray Made Simple: Lay Radiology Report Generation and Robust Evaluation
X-ray Made Simple: Lay Radiology Report Generation and Robust Evaluation
Kun Zhao
Chenghao Xiao
Chen Tang
Bohao Yang
Kai Ye
Noura Al Moubayed
Liang Zhan
Chenghua Lin
86
0
0
25 Jun 2024
UBiSS: A Unified Framework for Bimodal Semantic Summarization of Videos
UBiSS: A Unified Framework for Bimodal Semantic Summarization of Videos
Yuting Mei
Linli Yao
Qin Jin
65
1
0
24 Jun 2024
Does Object Grounding Really Reduce Hallucination of Large
  Vision-Language Models?
Does Object Grounding Really Reduce Hallucination of Large Vision-Language Models?
Gregor Geigle
Radu Timofte
Goran Glavaš
83
0
0
20 Jun 2024
From Descriptive Richness to Bias: Unveiling the Dark Side of Generative
  Image Caption Enrichment
From Descriptive Richness to Bias: Unveiling the Dark Side of Generative Image Caption Enrichment
Yusuke Hirota
Ryo Hachiuma
Chao-Han Huck Yang
Yuta Nakashima
VLM
89
4
0
20 Jun 2024
Adaptable Logical Control for Large Language Models
Adaptable Logical Control for Large Language Models
Honghua Zhang
Po-Nien Kung
Masahiro Yoshida
Guy Van den Broeck
Nanyun Peng
71
10
0
19 Jun 2024
StableSemantics: A Synthetic Language-Vision Dataset of Semantic
  Representations in Naturalistic Images
StableSemantics: A Synthetic Language-Vision Dataset of Semantic Representations in Naturalistic Images
Rushikesh Zawar
Shaurya Dewan
Andrew F. Luo
Margaret M. Henderson
Michael J. Tarr
Leila Wehbe
VGenCoGe
76
1
0
19 Jun 2024
Previous
123...678...424344
Next