ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1411.5726
  4. Cited By
CIDEr: Consensus-based Image Description Evaluation
v1v2 (latest)

CIDEr: Consensus-based Image Description Evaluation

20 November 2014
Ramakrishna Vedantam
C. L. Zitnick
Devi Parikh
ArXiv (abs)PDFHTML

Papers citing "CIDEr: Consensus-based Image Description Evaluation"

50 / 2,183 papers shown
Title
GeoGuess: Multimodal Reasoning based on Hierarchy of Visual Information in Street View
GeoGuess: Multimodal Reasoning based on Hierarchy of Visual Information in Street View
Fenghua Cheng
Jinxiang Wang
Sen Wang
Zi Huang
Xue Li
LRM
21
0
0
19 Jun 2025
PR-DETR: Injecting Position and Relation Prior for Dense Video Captioning
PR-DETR: Injecting Position and Relation Prior for Dense Video Captioning
Yizhe Li
Sanping Zhou
Zheng Qin
Le Wang
ViT
17
0
0
19 Jun 2025
DiscoSG: Towards Discourse-Level Text Scene Graph Parsing through Iterative Graph Refinement
DiscoSG: Towards Discourse-Level Text Scene Graph Parsing through Iterative Graph Refinement
Shaoqing Lin
Chong Teng
Fei Li
Donghong Ji
Lizhen Qu
Z. Li
29
0
0
18 Jun 2025
Revisiting Compositional Generalization Capability of Large Language Models Considering Instruction Following Ability
Revisiting Compositional Generalization Capability of Large Language Models Considering Instruction Following Ability
Yusuke Sakai
Hidetaka Kamigaito
Taro Watanabe
LRM
29
0
0
18 Jun 2025
An Empirical Study of LLM-as-a-Judge: How Design Choices Impact Evaluation Reliability
An Empirical Study of LLM-as-a-Judge: How Design Choices Impact Evaluation Reliability
Yusuke Yamauchi
Taro Yano
Masafumi Oyamada
ELM
22
0
0
16 Jun 2025
Improving Medical Visual Representation Learning with Pathological-level Cross-Modal Alignment and Correlation Exploration
Improving Medical Visual Representation Learning with Pathological-level Cross-Modal Alignment and Correlation Exploration
Jun Wang
Lixing Zhu
Xiaohan Yu
A. Bhalerao
Yulan He
122
0
0
12 Jun 2025
TRIDENT: Temporally Restricted Inference via DFA-Enhanced Neural Traversal
TRIDENT: Temporally Restricted Inference via DFA-Enhanced Neural Traversal
Vincenzo Collura
Karim Tit
Laura Bussi
Eleonora Giunchiglia
Maxime Cordy
56
0
0
11 Jun 2025
A Novel Lightweight Transformer with Edge-Aware Fusion for Remote Sensing Image Captioning
Swadhin Das
Divyansh Mundra
Priyanshu Dayal
Raksha Sharma
47
0
0
11 Jun 2025
Generating Vision-Language Navigation Instructions Incorporated Fine-Grained Alignment Annotations
Yibo Cui
Liang Xie
Yu Zhao
Jiawei Sun
Erwei Yin
17
0
0
10 Jun 2025
LiteVLM: A Low-Latency Vision-Language Model Inference Pipeline for Resource-Constrained Environments
LiteVLM: A Low-Latency Vision-Language Model Inference Pipeline for Resource-Constrained Environments
Jin Huang
Yuchao Jin
Le An
Josh Park
VLM
12
0
0
09 Jun 2025
Unblocking Fine-Grained Evaluation of Detailed Captions: An Explaining AutoRater and Critic-and-Revise Pipeline
Unblocking Fine-Grained Evaluation of Detailed Captions: An Explaining AutoRater and Critic-and-Revise Pipeline
Brian Gordon
Yonatan Bitton
Andreea Marzoca
Yasumasa Onoe
Xiao Wang
Daniel Cohen-Or
Idan Szpektor
CoGe
16
0
0
09 Jun 2025
FREE: Fast and Robust Vision Language Models with Early Exits
FREE: Fast and Robust Vision Language Models with Early Exits
Divya J. Bajpai
M. Hanawal
VLM
17
0
0
07 Jun 2025
ExAct: A Video-Language Benchmark for Expert Action Analysis
ExAct: A Video-Language Benchmark for Expert Action Analysis
Han Yi
Yulu Pan
Feihong He
Xinyu Liu
Benjamin Zhang
Oluwatumininu Oguntola
Gedas Bertasius
57
0
0
06 Jun 2025
AuthGuard: Generalizable Deepfake Detection via Language Guidance
Guangyu Shen
Zhihua Li
Xiang Xu
Tianchen Zhao
Zheng Zhang
Dongsheng An
Zhuowen Tu
Yifan Xing
Qin Zhang
23
0
0
04 Jun 2025
Go Beyond Earth: Understanding Human Actions and Scenes in Microgravity Environments
Go Beyond Earth: Understanding Human Actions and Scenes in Microgravity Environments
Di Wen
Lei Qi
Kunyu Peng
Kailun Yang
Fei Teng
...
Yufan Chen
R. Liu
Yitian Shi
M. Sarfraz
Rainer Stiefelhagen
64
0
0
03 Jun 2025
Attention-based transformer models for image captioning across languages: An in-depth survey and evaluation
Attention-based transformer models for image captioning across languages: An in-depth survey and evaluation
Israa A. Albadarneh
Bassam Hammo
Omar Al-Kadi
VLM
29
0
0
03 Jun 2025
VidEvent: A Large Dataset for Understanding Dynamic Evolution of Events in Videos
VidEvent: A Large Dataset for Understanding Dynamic Evolution of Events in Videos
Baoyu Liang
Qile Su
Shoutai Zhu
Yuchen Liang
Chao Tong
VGen
51
1
0
03 Jun 2025
Cycle Consistency as Reward: Learning Image-Text Alignment without Human Preferences
Cycle Consistency as Reward: Learning Image-Text Alignment without Human Preferences
Hyojin Bahng
Caroline Chan
F. Durand
Phillip Isola
EGVM
29
0
0
02 Jun 2025
CLAP-ART: Automated Audio Captioning with Semantic-rich Audio Representation Tokenizer
CLAP-ART: Automated Audio Captioning with Semantic-rich Audio Representation Tokenizer
Daiki Takeuchi
Binh Thien Nguyen
Masahiro Yasuda
Yasunori Ohishi
Daisuke Niizumi
Noboru Harada
VLM
35
0
0
01 Jun 2025
The Security Threat of Compressed Projectors in Large Vision-Language Models
The Security Threat of Compressed Projectors in Large Vision-Language Models
Yudong Zhang
Ruobing Xie
Xingwu Sun
Jiansheng Chen
Zhanhui Kang
Di Wang
Yu Wang
19
0
0
31 May 2025
Navigating the Accuracy-Size Trade-Off with Flexible Model Merging
Navigating the Accuracy-Size Trade-Off with Flexible Model Merging
Akash Dhasade
Divyansh Jhunjhunwala
Milos Vujasinovic
Gauri Joshi
Anne-Marie Kermarrec
MoMe
66
0
0
29 May 2025
VCapsBench: A Large-scale Fine-grained Benchmark for Video Caption Quality Evaluation
VCapsBench: A Large-scale Fine-grained Benchmark for Video Caption Quality Evaluation
Shi-Xue Zhang
Hongfa Wang
Duojun Huang
Xin Li
Xiaobin Zhu
Xu-Cheng Yin
CoGe
63
0
0
29 May 2025
RICO: Improving Accuracy and Completeness in Image Recaptioning via Visual Reconstruction
RICO: Improving Accuracy and Completeness in Image Recaptioning via Visual Reconstruction
Yuchi Wang
Yishuo Cai
Shuhuai Ren
Sihan Yang
Linli Yao
Yuanxin Liu
Y. Zhang
Pengfei Wan
Xu Sun
VLM
62
0
0
28 May 2025
GETReason: Enhancing Image Context Extraction through Hierarchical Multi-Agent Reasoning
GETReason: Enhancing Image Context Extraction through Hierarchical Multi-Agent Reasoning
Shikhhar Siingh
Abhinav Rawat
Chitta Baral
Vivek Gupta
38
0
0
28 May 2025
TUNA: Comprehensive Fine-grained Temporal Understanding Evaluation on Dense Dynamic Videos
TUNA: Comprehensive Fine-grained Temporal Understanding Evaluation on Dense Dynamic Videos
Fanheng Kong
Jingyuan Zhang
Hongzhi Zhang
Shi Feng
Daling Wang
Linhao Yu
Xingguang Ji
Yu Tian
Qi Wang
Fuzheng Zhang
58
1
0
26 May 2025
Chain-of-Thought for Autonomous Driving: A Comprehensive Survey and Future Prospects
Chain-of-Thought for Autonomous Driving: A Comprehensive Survey and Future Prospects
Yixin Cui
Haotian Lin
Shuo Yang
Yixiao Wang
Yanjun Huang
Hong Chen
LM&RoLRMELM
121
0
0
26 May 2025
Slot-MLLM: Object-Centric Visual Tokenization for Multimodal LLM
Slot-MLLM: Object-Centric Visual Tokenization for Multimodal LLM
Donghwan Chi
Hyomin Kim
Yoonjin Oh
Yongjin Kim
Donghoon Lee
DaeJin Jo
Jongmin Kim
Junyeob Baek
Sungjin Ahn
Sungwoong Kim
MLLMVLM
480
0
0
23 May 2025
Scaling Up Biomedical Vision-Language Models: Fine-Tuning, Instruction Tuning, and Multi-Modal Learning
Cheng Peng
Kai Zhang
Mengxian Lyu
Hongfang Liu
Lichao Sun
Yonghui Wu
LM&MAMedImVLM
278
0
0
23 May 2025
Panoptic Captioning: Seeking An Equivalency Bridge for Image and Text
Panoptic Captioning: Seeking An Equivalency Bridge for Image and Text
Kun-Yu Lin
Hongjun Wang
Weining Ren
Kai Han
291
0
0
22 May 2025
Redemption Score: An Evaluation Framework to Rank Image Captions While Redeeming Image Semantics and Language Pragmatics
Redemption Score: An Evaluation Framework to Rank Image Captions While Redeeming Image Semantics and Language Pragmatics
Ashim Dahal
Ankit Ghimire
Saydul Akbar Murad
Nick Rahimi
54
0
0
22 May 2025
Grounding Chest X-Ray Visual Question Answering with Generated Radiology Reports
Grounding Chest X-Ray Visual Question Answering with Generated Radiology Reports
Francesco Dalla Serra
Patrick Schrempf
Chaoyang Wang
Zaiqiao Meng
Fani Deligianni
Alison Q. OÑeil
43
0
0
22 May 2025
DC-Scene: Data-Centric Learning for 3D Scene Understanding
DC-Scene: Data-Centric Learning for 3D Scene Understanding
Ting Huang
Zeyu Zhang
Ruicheng Zhang
Yang Zhao
83
0
0
21 May 2025
Exploring The Visual Feature Space for Multimodal Neural Decoding
Exploring The Visual Feature Space for Multimodal Neural Decoding
Weihao Xia
Cengiz Öztireli
75
0
0
21 May 2025
TACO: Enhancing Multimodal In-context Learning via Task Mapping-Guided Sequence Configuration
TACO: Enhancing Multimodal In-context Learning via Task Mapping-Guided Sequence Configuration
Yanshu Li
Tian Yun
Jianjiang Yang
Pinyuan Feng
Jinfa Huang
Ruixiang Tang
69
2
0
21 May 2025
Harnessing Caption Detailness for Data-Efficient Text-to-Image Generation
Harnessing Caption Detailness for Data-Efficient Text-to-Image Generation
Xinran Wang
Muxi Diao
Yuanzhi Liu
Chunyu Wang
Kongming Liang
Zhanyu Ma
Jun Guo
94
0
0
21 May 2025
TinyDrive: Multiscale Visual Question Answering with Selective Token Routing for Autonomous Driving
TinyDrive: Multiscale Visual Question Answering with Selective Token Routing for Autonomous Driving
Hossein Hassani
Soodeh Nikan
Abdallah Shami
MLLM
143
0
0
21 May 2025
ALN-P3: Unified Language Alignment for Perception, Prediction, and Planning in Autonomous Driving
ALN-P3: Unified Language Alignment for Perception, Prediction, and Planning in Autonomous Driving
Yunsheng Ma
Burhaneddin Yaman
Xin Ye
Mahmut Yurt
Jingru Luo
Abhirup Mallik
Ziran Wang
Liu Ren
106
0
0
21 May 2025
Vision-Language Modeling Meets Remote Sensing: Models, Datasets and Perspectives
Vision-Language Modeling Meets Remote Sensing: Models, Datasets and Perspectives
Xingxing Weng
Chao Pang
Gui-Song Xia
VLM
115
0
0
20 May 2025
KERL: Knowledge-Enhanced Personalized Recipe Recommendation using Large Language Models
KERL: Knowledge-Enhanced Personalized Recipe Recommendation using Large Language Models
Fnu Mohbat
Mohammed J Zaki
47
0
0
20 May 2025
TS-VLM: Text-Guided SoftSort Pooling for Vision-Language Models in Multi-View Driving Reasoning
TS-VLM: Text-Guided SoftSort Pooling for Vision-Language Models in Multi-View Driving Reasoning
Lihong Chen
Hossein Hassani
Soodeh Nikan
VLM
104
0
0
19 May 2025
Hearing from Silence: Reasoning Audio Descriptions from Silent Videos via Vision-Language Model
Hearing from Silence: Reasoning Audio Descriptions from Silent Videos via Vision-Language Model
Yong Ren
Chenxing Li
Le Xu
Hao Gu
Duzhen Zhang
Yujie Chen
Manjie Xu
Ruibo Fu
Shan Yang
Dong Yu
LRM
92
0
0
19 May 2025
A Light and Smart Wearable Platform with Multimodal Foundation Model for Enhanced Spatial Reasoning in People with Blindness and Low Vision
A Light and Smart Wearable Platform with Multimodal Foundation Model for Enhanced Spatial Reasoning in People with Blindness and Low Vision
Alexey Magay
Dhurba Tripathi
Yu Hao
Yi Fang
79
0
0
16 May 2025
Generative Models in Computational Pathology: A Comprehensive Survey on Methods, Applications, and Challenges
Generative Models in Computational Pathology: A Comprehensive Survey on Methods, Applications, and Challenges
Yuan Zhang
Xinfeng Zhang
Xiaoming Qi Xinyu Wu
Feng Chen
Guanyu Yang
Huazhu Fu
MedImLM&MAAI4CE
163
0
0
16 May 2025
Ultrasound Report Generation with Multimodal Large Language Models for Standardized Texts
Ultrasound Report Generation with Multimodal Large Language Models for Standardized Texts
Peixuan Ge
Tongkun Su
Faqin Lv
Baoliang Zhao
Peng Zhang
...
Liang Yao
Yu Sun
Zenan Wang
Pak Kin Wong
Ying Hu
MedIm
57
0
0
13 May 2025
Extending Large Vision-Language Model for Diverse Interactive Tasks in Autonomous Driving
Extending Large Vision-Language Model for Diverse Interactive Tasks in Autonomous Driving
Zongchuang Zhao
Haoyu Fu
Dingkang Liang
Xin Zhou
Dingyuan Zhang
Hongwei Xie
Bing Wang
Xiang Bai
MLLMVLM
125
0
0
13 May 2025
DriveSOTIF: Advancing Perception SOTIF Through Multimodal Large Language Models
DriveSOTIF: Advancing Perception SOTIF Through Multimodal Large Language Models
Shucheng Huang
Freda Shi
Chen Sun
Jiaming Zhong
Minghao Ning
Yufeng Yang
Yukun Lu
Hong Wang
A. Khajepour
95
0
0
11 May 2025
SToLa: Self-Adaptive Touch-Language Framework with Tactile Commonsense Reasoning in Open-Ended Scenarios
SToLa: Self-Adaptive Touch-Language Framework with Tactile Commonsense Reasoning in Open-Ended Scenarios
Ning Cheng
Jinan Xu
Jialing Chen
Wenjuan Han
LRM
77
0
0
07 May 2025
LecEval: An Automated Metric for Multimodal Knowledge Acquisition in Multimedia Learning
LecEval: An Automated Metric for Multimodal Knowledge Acquisition in Multimedia Learning
Joy Lim Jia Yin
Daniel Zhang-Li
Jifan Yu
Haoyang Li
Shangqing Tu
...
Zhiyuan Liu
Huiqin Liu
Lei Hou
Juanzi Li
Bin Xu
77
0
0
04 May 2025
Multimodal Large Language Models for Medicine: A Comprehensive Survey
Multimodal Large Language Models for Medicine: A Comprehensive Survey
Jiarui Ye
Hao Tang
LM&MA
183
0
0
29 Apr 2025
VCM: Vision Concept Modeling Based on Implicit Contrastive Learning with Vision-Language Instruction Fine-Tuning
VCM: Vision Concept Modeling Based on Implicit Contrastive Learning with Vision-Language Instruction Fine-Tuning
Run Luo
Renke Shan
Longze Chen
Ziqiang Liu
Lu Wang
Min Yang
Xiaobo Xia
MLLMVLM
262
1
0
28 Apr 2025
1234...424344
Next