ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1411.5726
  4. Cited By
CIDEr: Consensus-based Image Description Evaluation

CIDEr: Consensus-based Image Description Evaluation

20 November 2014
Ramakrishna Vedantam
C. L. Zitnick
Devi Parikh
ArXivPDFHTML

Papers citing "CIDEr: Consensus-based Image Description Evaluation"

50 / 2,137 papers shown
Title
See It All: Contextualized Late Aggregation for 3D Dense Captioning
See It All: Contextualized Late Aggregation for 3D Dense Captioning
Minjung Kim
Hyung Suk Lim
Seung Hwan Kim
Soonyoung Lee
Bumsoo Kim
Gunhee Kim
55
4
0
14 Aug 2024
Bi-directional Contextual Attention for 3D Dense Captioning
Bi-directional Contextual Attention for 3D Dense Captioning
Minjung Kim
Hyung Suk Lim
Soonyoung Lee
Bumsoo Kim
Gunhee Kim
43
3
0
13 Aug 2024
Context-aware Visual Storytelling with Visual Prefix Tuning and
  Contrastive Learning
Context-aware Visual Storytelling with Visual Prefix Tuning and Contrastive Learning
Yingjin Song
Denis Paperno
Albert Gatt
29
0
0
12 Aug 2024
Speech vs. Transcript: Does It Matter for Human Annotators in Speech
  Summarization?
Speech vs. Transcript: Does It Matter for Human Annotators in Speech Summarization?
Roshan S. Sharma
Suwon Shon
Mark Lindsey
Hira Dhamyal
Rita Singh
Bhiksha Raj
56
1
0
12 Aug 2024
Hyperbolic Learning with Multimodal Large Language Models
Hyperbolic Learning with Multimodal Large Language Models
Paolo Mandica
Luca Franco
Konstantinos Kallidromitis
Suzanne Petryk
Fabio Galasso
44
1
0
09 Aug 2024
Surveying the Landscape of Image Captioning Evaluation: A Comprehensive Taxonomy, Trends and Metrics Analysis
Surveying the Landscape of Image Captioning Evaluation: A Comprehensive Taxonomy, Trends and Metrics Analysis
Uri Berger
Gabriel Stanovsky
Omri Abend
Lea Frermann
35
0
0
09 Aug 2024
Img-Diff: Contrastive Data Synthesis for Multimodal Large Language
  Models
Img-Diff: Contrastive Data Synthesis for Multimodal Large Language Models
Qirui Jiao
Daoyuan Chen
Yilun Huang
Yaliang Li
Ying Shen
VLM
40
5
0
08 Aug 2024
UNMuTe: Unifying Navigation and Multimodal Dialogue-like Text Generation
UNMuTe: Unifying Navigation and Multimodal Dialogue-like Text Generation
Niyati Rawal
Roberto Bigazzi
Lorenzo Baraldi
Rita Cucchiara
LM&Ro
37
1
0
08 Aug 2024
Dual-path Collaborative Generation Network for Emotional Video
  Captioning
Dual-path Collaborative Generation Network for Emotional Video Captioning
Cheng Ye
Weidong Chen
Jingyu Li
L. Zhang
Zhendong Mao
92
1
0
06 Aug 2024
Multitask and Multimodal Neural Tuning for Large Models
Multitask and Multimodal Neural Tuning for Large Models
Hao Sun
Yu Song
Jihong Hu
Yen-Wei Chen
Lanfen Lin
VLM
27
0
0
06 Aug 2024
GazeXplain: Learning to Predict Natural Language Explanations of Visual
  Scanpaths
GazeXplain: Learning to Predict Natural Language Explanations of Visual Scanpaths
Xianyu Chen
Ming Jiang
Qi Zhao
24
2
0
05 Aug 2024
COM Kitchens: An Unedited Overhead-view Video Dataset as a
  Vision-Language Benchmark
COM Kitchens: An Unedited Overhead-view Video Dataset as a Vision-Language Benchmark
Koki Maeda
Tosho Hirasawa
Atsushi Hashimoto
Jun Harashima
Leszek Rybicki
Yusuke Fukasawa
Yoshitaka Ushiku
48
0
0
05 Aug 2024
A Novel Evaluation Framework for Image2Text Generation
A Novel Evaluation Framework for Image2Text Generation
Jia-Hong Huang
Hongyi Zhu
Yixian Shen
S. Rudinac
A. M. Pacces
Evangelos Kanoulas
47
7
0
03 Aug 2024
Learning Video Context as Interleaved Multimodal Sequences
Learning Video Context as Interleaved Multimodal Sequences
S. Shao
Pengchuan Zhang
Y. Li
Xide Xia
A. Meso
Ziteng Gao
Jinheng Xie
N. Holliman
Mike Zheng Shou
46
5
0
31 Jul 2024
ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large
  Language Models
ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models
Ming-Kuan Wu
Xinyue Cai
Jiayi Ji
Jiale Li
Oucheng Huang
Gen Luo
Hao Fei
Xiaoshuai Sun
Rongrong Ji
MLLM
51
7
0
31 Jul 2024
MMTrail: A Multimodal Trailer Video Dataset with Language and Music
  Descriptions
MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions
Xiaowei Chi
Yatian Wang
Aosong Cheng
Pengjun Fang
Zeyue Tian
...
Wenhan Luo
Qifeng Chen
Shanghang Zhang
Qi-fei Liu
Yi-Ting Guo
75
7
0
30 Jul 2024
Effectively Leveraging CLIP for Generating Situational Summaries of Images and Videos
Effectively Leveraging CLIP for Generating Situational Summaries of Images and Videos
Dhruv Verma
Debaditya Roy
Basura Fernando
27
1
0
30 Jul 2024
BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger
  Visual Cues
BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger Visual Cues
Sara Sarto
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
41
6
0
29 Jul 2024
A Labeled Ophthalmic Ultrasound Dataset with Medical Report Generation
  Based on Cross-modal Deep Learning
A Labeled Ophthalmic Ultrasound Dataset with Medical Report Generation Based on Cross-modal Deep Learning
Jing Wang
Junyan Fan
Meng Zhou
Yanzhu Zhang
Mingyu Shi
23
1
0
26 Jul 2024
HICEScore: A Hierarchical Metric for Image Captioning Evaluation
HICEScore: A Hierarchical Metric for Image Captioning Evaluation
Zequn Zeng
Jianqiao Sun
Hao Zhang
Tiansheng Wen
Yudi Su
Yan Xie
Zhengjue Wang
Boli Chen
49
3
0
26 Jul 2024
AutoAD-Zero: A Training-Free Framework for Zero-Shot Audio Description
AutoAD-Zero: A Training-Free Framework for Zero-Shot Audio Description
Junyu Xie
Tengda Han
Max Bain
Arsha Nagrani
Gül Varol
Weidi Xie
Andrew Zisserman
VGen
39
8
0
22 Jul 2024
HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal
  Reasoning
HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal Reasoning
Zhecan Wang
Garrett Bingham
Adams Wei Yu
Quoc V. Le
Thang Luong
Golnaz Ghiasi
MLLM
LRM
45
9
0
22 Jul 2024
Navigation Instruction Generation with BEV Perception and Large Language
  Models
Navigation Instruction Generation with BEV Perception and Large Language Models
Sheng Fan
Rui Liu
Wenguan Wang
Yi Yang
45
5
0
21 Jul 2024
Sim-CLIP: Unsupervised Siamese Adversarial Fine-Tuning for Robust and
  Semantically-Rich Vision-Language Models
Sim-CLIP: Unsupervised Siamese Adversarial Fine-Tuning for Robust and Semantically-Rich Vision-Language Models
Md Zarif Hossain
Ahmed Imteaj
VLM
AAML
42
4
0
20 Jul 2024
Downstream-Pretext Domain Knowledge Traceback for Active Learning
Downstream-Pretext Domain Knowledge Traceback for Active Learning
Beichen Zhang
Liang-Sheng Li
Zheng-Jun Zha
Jiebo Luo
Qingming Huang
41
0
0
20 Jul 2024
Semantic-CC: Boosting Remote Sensing Image Change Captioning via
  Foundational Knowledge and Semantic Guidance
Semantic-CC: Boosting Remote Sensing Image Change Captioning via Foundational Knowledge and Semantic Guidance
Yongshuo Zhu
Lu Li
Keyan Chen
Chenyang Liu
Fugen Zhou
Z. Shi
37
4
0
19 Jul 2024
NeuroBind: Towards Unified Multimodal Representations for Neural Signals
NeuroBind: Towards Unified Multimodal Representations for Neural Signals
Fengyu Yang
Chao Feng
Daniel Wang
Tianye Wang
Ziyao Zeng
...
Hyoungseob Park
Pengliang Ji
Han Zhao
Yuanning Li
Alex Wong
39
9
0
19 Jul 2024
Nearest Neighbor Future Captioning: Generating Descriptions for Possible
  Collisions in Object Placement Tasks
Nearest Neighbor Future Captioning: Generating Descriptions for Possible Collisions in Object Placement Tasks
Takumi Komatsu
Motonari Kambara
Shumpei Hatanaka
Haruka Matsuo
Tsubasa Hirakawa
Takayoshi Yamashita
H. Fujiyoshi
Komei Sugiura
43
0
0
18 Jul 2024
MoME: Mixture of Multimodal Experts for Generalist Multimodal Large
  Language Models
MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models
Leyang Shen
Gongwei Chen
Rui Shao
Weili Guan
Liqiang Nie
MoE
45
6
0
17 Jul 2024
ModalChorus: Visual Probing and Alignment of Multi-modal Embeddings via
  Modal Fusion Map
ModalChorus: Visual Probing and Alignment of Multi-modal Embeddings via Modal Fusion Map
Yilin Ye
Shishi Xiao
Xingchen Zeng
Wei Zeng
46
3
0
17 Jul 2024
Distractors-Immune Representation Learning with Cross-modal Contrastive
  Regularization for Change Captioning
Distractors-Immune Representation Learning with Cross-modal Contrastive Regularization for Change Captioning
Yunbin Tu
Liang-Sheng Li
Li Su
Chenggang Yan
Qin Huang
40
5
0
16 Jul 2024
FIRE: A Dataset for Feedback Integration and Refinement Evaluation of
  Multimodal Models
FIRE: A Dataset for Feedback Integration and Refinement Evaluation of Multimodal Models
Pengxiang Li
Zhi Gao
Bofei Zhang
Tao Yuan
Yuwei Wu
Mehrtash Harandi
Yunde Jia
Song-Chun Zhu
Qing Li
VLM
MLLM
48
3
0
16 Jul 2024
Controllable Contextualized Image Captioning: Directing the Visual
  Narrative through User-Defined Highlights
Controllable Contextualized Image Captioning: Directing the Visual Narrative through User-Defined Highlights
Shunqi Mao
Chaoyi Zhang
Hang Su
Hwanjun Song
Igor Shalyminov
Weidong Cai
39
1
0
16 Jul 2024
SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers
SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers
Shraman Pramanick
Rama Chellappa
Subhashini Venugopalan
50
14
0
12 Jul 2024
LEMoN: Label Error Detection using Multimodal Neighbors
LEMoN: Label Error Detection using Multimodal Neighbors
Haoran Zhang
Aparna Balagopalan
Nassim Oufattole
Hyewon Jeong
Yan Wu
Jiacheng Zhu
Marzyeh Ghassemi
46
0
0
10 Jul 2024
Controllable Navigation Instruction Generation with Chain of Thought
  Prompting
Controllable Navigation Instruction Generation with Chain of Thought Prompting
Xianghao Kong
Jinyu Chen
Wenguan Wang
Hang Su
Xiaolin Hu
Yi Yang
Si Liu
LRM
45
4
0
10 Jul 2024
Vision-Language Models under Cultural and Inclusive Considerations
Vision-Language Models under Cultural and Inclusive Considerations
Antonia Karamolegkou
Phillip Rust
Yong Cao
Ruixiang Cui
Anders Søgaard
Daniel Hershcovich
VLM
53
7
0
08 Jul 2024
OneDiff: A Generalist Model for Image Difference Captioning
OneDiff: A Generalist Model for Image Difference Captioning
Erdong Hu
Longteng Guo
Tongtian Yue
Zijia Zhao
Shuning Xue
Jing Liu
VLM
31
2
0
08 Jul 2024
Ask Questions with Double Hints: Visual Question Generation with
  Answer-awareness and Region-reference
Ask Questions with Double Hints: Visual Question Generation with Answer-awareness and Region-reference
Kai Shen
Lingfei Wu
Siliang Tang
Fangli Xu
Bo Long
Yueting Zhuang
Jian Pei
35
0
0
06 Jul 2024
Not (yet) the whole story: Evaluating Visual Storytelling Requires More
  than Measuring Coherence, Grounding, and Repetition
Not (yet) the whole story: Evaluating Visual Storytelling Requires More than Measuring Coherence, Grounding, and Repetition
Aditya K Surikuchi
Raquel Fernández
Sandro Pezzelle
26
3
0
05 Jul 2024
M5 -- A Diverse Benchmark to Assess the Performance of Large Multimodal
  Models Across Multilingual and Multicultural Vision-Language Tasks
M5 -- A Diverse Benchmark to Assess the Performance of Large Multimodal Models Across Multilingual and Multicultural Vision-Language Tasks
Florian Schneider
Sunayana Sitaram
VLM
45
7
0
04 Jul 2024
Multi-Modal Video Dialog State Tracking in the Wild
Multi-Modal Video Dialog State Tracking in the Wild
Adnen Abdessaied
Lei Shi
Andreas Bulling
19
2
0
02 Jul 2024
Extracting and Encoding: Leveraging Large Language Models and Medical
  Knowledge to Enhance Radiological Text Representation
Extracting and Encoding: Leveraging Large Language Models and Medical Knowledge to Enhance Radiological Text Representation
Pablo Messina
René Vidal
Denis Parra
Álvaro Soto
Vladimir Araujo
MedIm
64
2
0
02 Jul 2024
A Bounding Box is Worth One Token: Interleaving Layout and Text in a Large Language Model for Document Understanding
A Bounding Box is Worth One Token: Interleaving Layout and Text in a Large Language Model for Document Understanding
Jinghui Lu
Haiyang Yu
Yanjie Wang
Yongjie Ye
Jingqun Tang
...
Qi Liu
Hao Feng
Han Wang
Hao Liu
Can Huang
50
19
0
02 Jul 2024
Meerkat: Audio-Visual Large Language Model for Grounding in Space and
  Time
Meerkat: Audio-Visual Large Language Model for Grounding in Space and Time
Sanjoy Chowdhury
Sayan Nag
Subhrajyoti Dasgupta
Jun Chen
Mohamed Elhoseiny
Ruohan Gao
Dinesh Manocha
VLM
MLLM
41
9
0
01 Jul 2024
From Introspection to Best Practices: Principled Analysis of Demonstrations in Multimodal In-Context Learning
From Introspection to Best Practices: Principled Analysis of Demonstrations in Multimodal In-Context Learning
Nan Xu
Fei Wang
Sheng Zhang
Hoifung Poon
Muhao Chen
36
6
0
01 Jul 2024
Tarsier: Recipes for Training and Evaluating Large Video Description
  Models
Tarsier: Recipes for Training and Evaluating Large Video Description Models
Jiawei Wang
Liping Yuan
Yuchen Zhang
47
52
0
30 Jun 2024
DocKylin: A Large Multimodal Model for Visual Document Understanding
  with Efficient Visual Slimming
DocKylin: A Large Multimodal Model for Visual Document Understanding with Efficient Visual Slimming
Jiaxin Zhang
Wentao Yang
Songxuan Lai
Zecheng Xie
Lianwen Jin
37
15
0
27 Jun 2024
MatchTime: Towards Automatic Soccer Game Commentary Generation
MatchTime: Towards Automatic Soccer Game Commentary Generation
Jiayuan Rao
Haoning Wu
Chang-rui Liu
Yanfeng Wang
Weidi Xie
43
7
0
26 Jun 2024
GUIDE: A Guideline-Guided Dataset for Instructional Video Comprehension
GUIDE: A Guideline-Guided Dataset for Instructional Video Comprehension
Jiafeng Liang
Shixin Jiang
Zekun Wang
Haojie Pan
Zerui Chen
Zheng Chu
Ming Liu
Ruiji Fu
Zhongyuan Wang
Bing Qin
29
2
0
26 Jun 2024
Previous
123...567...414243
Next