ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1411.5726
  4. Cited By
CIDEr: Consensus-based Image Description Evaluation
v1v2 (latest)

CIDEr: Consensus-based Image Description Evaluation

20 November 2014
Ramakrishna Vedantam
C. L. Zitnick
Devi Parikh
ArXiv (abs)PDFHTML

Papers citing "CIDEr: Consensus-based Image Description Evaluation"

50 / 2,183 papers shown
Title
MIO: A Foundation Model on Multimodal Tokens
MIO: A Foundation Model on Multimodal Tokens
Zekun Wang
King Zhu
Chunpu Xu
Wangchunshu Zhou
Jiaheng Liu
...
Yuanxing Zhang
Ge Zhang
Ke Xu
Jie Fu
Wenhao Huang
MLLMAuLLM
173
12
0
26 Sep 2024
Domain-Independent Automatic Generation of Descriptive Texts for
  Time-Series Data
Domain-Independent Automatic Generation of Descriptive Texts for Time-Series Data
Kota Dohi
Aoi Ito
Harsh Purohit
Tomoya Nishida
Takashi Endo
Yohei Kawaguchi
55
3
0
25 Sep 2024
Overview of the First Shared Task on Clinical Text Generation: RRG24 and
  "Discharge Me!"
Overview of the First Shared Task on Clinical Text Generation: RRG24 and "Discharge Me!"
Justin Xu
Zhihong Chen
Andrew Johnston
Louis Blankemeier
Maya Varma
...
Ankit Modi
Robert Lloyd
Benjamin Hopkins
Curtis Langlotz
Jean-Benoit Delbrouck
LM&MA
95
26
0
25 Sep 2024
A-VL: Adaptive Attention for Large Vision-Language Models
A-VL: Adaptive Attention for Large Vision-Language Models
Junyang Zhang
Mu Yuan
Ruiguang Zhong
Puhan Luo
Huiyou Zhan
Ningkang Zhang
Chengchen Hu
Xiangyang Li
VLM
129
1
0
23 Sep 2024
@Bench: Benchmarking Vision-Language Models for Human-centered Assistive
  Technology
@Bench: Benchmarking Vision-Language Models for Human-centered Assistive Technology
Xin Jiang
Junwei Zheng
Ruiping Liu
Jiahang Li
Jiaming Zhang
Sven Matthiesen
Rainer Stiefelhagen
VLM
45
1
0
21 Sep 2024
Enhancing Advanced Visual Reasoning Ability of Large Language Models
Enhancing Advanced Visual Reasoning Ability of Large Language Models
Zhiyuan Li
Dongnan Liu
Chaoyi Zhang
Heng Wang
Tengfei Xue
Weidong Cai
VLMLRM
124
10
0
21 Sep 2024
HUT: A More Computation Efficient Fine-Tuning Method With Hadamard
  Updated Transformation
HUT: A More Computation Efficient Fine-Tuning Method With Hadamard Updated Transformation
Geyuan Zhang
Xiaofei Zhou
Chuheng Chen
38
0
0
20 Sep 2024
CLAIR-A: Leveraging Large Language Models to Judge Audio Captions
CLAIR-A: Leveraging Large Language Models to Judge Audio Captions
Tsung-Han Wu
Joseph E. Gonzalez
Trevor Darrell
David M. Chan
127
2
0
19 Sep 2024
Evaluating Image Hallucination in Text-to-Image Generation with
  Question-Answering
Evaluating Image Hallucination in Text-to-Image Generation with Question-Answering
Youngsun Lim
Hojun Choi
Hyunjung Shim
HILMEGVMMLLM
63
1
0
19 Sep 2024
Enhancing Perception of Key Changes in Remote Sensing Image Change
  Captioning
Enhancing Perception of Key Changes in Remote Sensing Image Change Captioning
Cong Yang
Zuchao Li
Hongzan Jiao
Zhi Gao
Lefei Zhang
63
1
0
19 Sep 2024
KALE: An Artwork Image Captioning System Augmented with Heterogeneous
  Graph
KALE: An Artwork Image Captioning System Augmented with Heterogeneous Graph
Yanbei Jiang
Krista A. Ehinger
Jey Han Lau
SLR
72
1
0
17 Sep 2024
Playground v3: Improving Text-to-Image Alignment with Deep-Fusion Large
  Language Models
Playground v3: Improving Text-to-Image Alignment with Deep-Fusion Large Language Models
Bingchen Liu
Ehsan Akhgari
Alexander Visheratin
Aleks Kamko
Linmiao Xu
Shivam Shrirao
Joao Souza
Suhail Doshi
Daiqing Li
Daiqing Li
DiffMMLLM
109
60
0
16 Sep 2024
Video Token Sparsification for Efficient Multimodal LLMs in Autonomous
  Driving
Video Token Sparsification for Efficient Multimodal LLMs in Autonomous Driving
Yunsheng Ma
Amr Abdelraouf
Rohit Gupta
Ziran Wang
Kyungtae Han
104
3
0
16 Sep 2024
Automatic Scene Generation: State-of-the-Art Techniques, Models,
  Datasets, Challenges, and Future Prospects
Automatic Scene Generation: State-of-the-Art Techniques, Models, Datasets, Challenges, and Future Prospects
Awal Ahmed Fime
Saifuddin Mahmud
Arpita Das
Md. Sunzidul Islam
Hong-Hoon Kim
VGen3DV
42
1
0
14 Sep 2024
Towards Diverse and Efficient Audio Captioning via Diffusion Models
Towards Diverse and Efficient Audio Captioning via Diffusion Models
Manjie Xu
Chenxing Li
Xinyi Tu
Yong Ren
Ruibo Fu
Wei Liang
Dong Yu
DiffM
93
2
0
14 Sep 2024
ChangeChat: An Interactive Model for Remote Sensing Change Analysis via
  Multimodal Instruction Tuning
ChangeChat: An Interactive Model for Remote Sensing Change Analysis via Multimodal Instruction Tuning
Pei Deng
Wenqian Zhou
Hanlin Wu
57
3
0
13 Sep 2024
Securing Vision-Language Models with a Robust Encoder Against Jailbreak
  and Adversarial Attacks
Securing Vision-Language Models with a Robust Encoder Against Jailbreak and Adversarial Attacks
Md Zarif Hossain
Ahmed Imteaj
AAMLVLM
81
6
0
11 Sep 2024
PiTe: Pixel-Temporal Alignment for Large Video-Language Model
PiTe: Pixel-Temporal Alignment for Large Video-Language Model
Yang Liu
Pengxiang Ding
Siteng Huang
Min Zhang
Han Zhao
Donglin Wang
84
7
0
11 Sep 2024
Hint-AD: Holistically Aligned Interpretability in End-to-End Autonomous
  Driving
Hint-AD: Holistically Aligned Interpretability in End-to-End Autonomous Driving
Kairui Ding
Boyuan Chen
Yuchen Su
Huan-ang Gao
Bu Jin
...
Wuqiang Zhang
Xiaohui Li
Paul Barsch
Hongyang Li
Hao Zhao
105
7
0
10 Sep 2024
Enhancing Long Video Understanding via Hierarchical Event-Based Memory
Enhancing Long Video Understanding via Hierarchical Event-Based Memory
Dingxin Cheng
Mingda Li
Jingyu Liu
Yongxin Guo
Bin Jiang
Qingbin Liu
Xi Chen
Bo Zhao
90
4
0
10 Sep 2024
Spatially-Aware Speaker for Vision-and-Language Navigation Instruction
  Generation
Spatially-Aware Speaker for Vision-and-Language Navigation Instruction Generation
Muraleekrishna Gopinathan
Martin Masek
Jumana Abu-Khalaf
David Suter
LM&Ro
79
2
0
09 Sep 2024
KARGEN: Knowledge-enhanced Automated Radiology Report Generation Using
  Large Language Models
KARGEN: Knowledge-enhanced Automated Radiology Report Generation Using Large Language Models
Yingshu Li
Zhanyu Wang
Yunyi Liu
Lei Wang
Lingqiao Liu
Luping Zhou
68
3
0
09 Sep 2024
FODA-PG for Enhanced Medical Imaging Narrative Generation: Adaptive
  Differentiation of Normal and Abnormal Attributes
FODA-PG for Enhanced Medical Imaging Narrative Generation: Adaptive Differentiation of Normal and Abnormal Attributes
Kai Shu
Yuzhuo Jia
Ziyang Zhang
Jiechao Gao
MedIm
87
0
0
06 Sep 2024
Question-Answering Dense Video Events
Question-Answering Dense Video Events
Hangyu Qin
Junbin Xiao
Angela Yao
VLM
123
1
0
06 Sep 2024
Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding
Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding
Yunze Man
Shuhong Zheng
Zhipeng Bao
M. Hebert
Liang-Yan Gui
Yu-Xiong Wang
142
23
0
05 Sep 2024
No Detail Left Behind: Revisiting Self-Retrieval for Fine-Grained Image Captioning
No Detail Left Behind: Revisiting Self-Retrieval for Fine-Grained Image Captioning
Manu Gaur
Darshan Singh
Makarand Tapaswi
461
1
0
04 Sep 2024
Retrieval-Augmented Natural Language Reasoning for Explainable Visual
  Question Answering
Retrieval-Augmented Natural Language Reasoning for Explainable Visual Question Answering
Su Hyeon Lim
Minkuk Kim
Hyeon Bae Kim
Seong Tae Kim
ReLMLRM
76
0
0
30 Aug 2024
Medical Report Generation Is A Multi-label Classification Problem
Medical Report Generation Is A Multi-label Classification Problem
Yijian Fan
Zhenbang Yang
Rui Liu
Mingjie Li
Xiaojun Chang
MedIm
131
1
0
30 Aug 2024
LLaVA-Chef: A Multi-modal Generative Model for Food Recipes
LLaVA-Chef: A Multi-modal Generative Model for Food Recipes
Fnu Mohbat
Mohammed J. Zaki
84
8
0
29 Aug 2024
See or Guess: Counterfactually Regularized Image Captioning
See or Guess: Counterfactually Regularized Image Captioning
Qian Cao
Xu Chen
Ruihua Song
Xiting Wang
Xinting Huang
Yuchen Ren
CML
92
1
0
29 Aug 2024
AutoGeo: Automating Geometric Image Dataset Creation for Enhanced
  Geometry Understanding
AutoGeo: Automating Geometric Image Dataset Creation for Enhanced Geometry Understanding
Zihan Huang
Tao Wu
Wang Lin
Shengyu Zhang
Jingyuan Chen
Fei Wu
64
12
0
28 Aug 2024
Pixels to Prose: Understanding the art of Image Captioning
Pixels to Prose: Understanding the art of Image Captioning
Hrishikesh Singh
Aarti Sharma
Millie Pant
3DVVLM
88
1
0
28 Aug 2024
AeroVerse: UAV-Agent Benchmark Suite for Simulating, Pre-training,
  Finetuning, and Evaluating Aerospace Embodied World Models
AeroVerse: UAV-Agent Benchmark Suite for Simulating, Pre-training, Finetuning, and Evaluating Aerospace Embodied World Models
Fanglong Yao
Yuanchang Yue
Youzhi Liu
Xian Sun
Kun Fu
VGenEgoV
66
8
0
28 Aug 2024
Fine-grained length controllable video captioning with ordinal
  embeddings
Fine-grained length controllable video captioning with ordinal embeddings
Tomoya Nitta
Takumi Fukuzawa
Toru Tamaki
98
0
0
27 Aug 2024
What Makes a Good Story and How Can We Measure It? A Comprehensive
  Survey of Story Evaluation
What Makes a Good Story and How Can We Measure It? A Comprehensive Survey of Story Evaluation
Dingyi Yang
Qin Jin
130
7
0
26 Aug 2024
Revisiting Image Captioning Training Paradigm via Direct CLIP-based
  Optimization
Revisiting Image Captioning Training Paradigm via Direct CLIP-based Optimization
Nicholas Moratelli
Davide Caffagni
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
CLIP
95
3
0
26 Aug 2024
Semantic Alignment for Multimodal Large Language Models
Semantic Alignment for Multimodal Large Language Models
Tao Wu
Mengze Li
Jingyuan Chen
Wei Ji
Wang Lin
Jinyang Gao
Kun Kuang
Zhou Zhao
Fei Wu
89
7
0
23 Aug 2024
TRRG: Towards Truthful Radiology Report Generation With Cross-modal
  Disease Clue Enhanced Large Language Model
TRRG: Towards Truthful Radiology Report Generation With Cross-modal Disease Clue Enhanced Large Language Model
Yuhao Wang
Chao Hao
Yawen Cui
Xinqi Su
Weicheng Xie
Tao Tan
Zitong Yu
LM&MAMedIm
75
0
0
22 Aug 2024
Transfusion: Predict the Next Token and Diffuse Images with One
  Multi-Modal Model
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
Chunting Zhou
Lili Yu
Arun Babu
Kushal Tirumala
Michihiro Yasunaga
Leonid Shamis
Jacob Kahn
Xuezhe Ma
Luke Zettlemoyer
Omer Levy
DiffM
130
190
0
20 Aug 2024
MegaFusion: Extend Diffusion Models towards Higher-resolution Image
  Generation without Further Tuning
MegaFusion: Extend Diffusion Models towards Higher-resolution Image Generation without Further Tuning
Haoning Wu
Shaocheng Shen
Qiang Hu
Xiaoyun Zhang
Ya Zhang
Yanfeng Wang
114
11
0
20 Aug 2024
R2GenCSR: Retrieving Context Samples for Large Language Model based
  X-ray Medical Report Generation
R2GenCSR: Retrieving Context Samples for Large Language Model based X-ray Medical Report Generation
Xiao Wang
Yuehang Li
Fuling Wang
Shiao Wang
Chuanfu Li
Bo Jiang
MedIm
79
8
0
19 Aug 2024
Quality Assessment in the Era of Large Models: A Survey
Quality Assessment in the Era of Large Models: A Survey
Zicheng Zhang
Yingjie Zhou
Chunyi Li
Baixuan Zhao
Xiaohong Liu
Guangtao Zhai
103
12
0
17 Aug 2024
Automatic Metrics in Natural Language Generation: A Survey of Current
  Evaluation Practices
Automatic Metrics in Natural Language Generation: A Survey of Current Evaluation Practices
Patrícia Schmidtová
Saad Mahamood
Simone Balloccu
Ondřej Dušek
Albert Gatt
Dimitra Gkatzia
David M. Howcroft
Ondřej Plátek
Adarsa Sivaprasad
80
5
0
17 Aug 2024
DIVE: Towards Descriptive and Diverse Visual Commonsense Generation
DIVE: Towards Descriptive and Diverse Visual Commonsense Generation
Jun-Hyung Park
Hyuntae Park
Youjin Kang
Eojin Jeon
SangKeun Lee
59
0
0
15 Aug 2024
See It All: Contextualized Late Aggregation for 3D Dense Captioning
See It All: Contextualized Late Aggregation for 3D Dense Captioning
Minjung Kim
Hyung Suk Lim
Seung Hwan Kim
Soonyoung Lee
Bumsoo Kim
Gunhee Kim
82
4
0
14 Aug 2024
Bi-directional Contextual Attention for 3D Dense Captioning
Bi-directional Contextual Attention for 3D Dense Captioning
Minjung Kim
Hyung Suk Lim
Soonyoung Lee
Bumsoo Kim
Gunhee Kim
72
3
0
13 Aug 2024
Context-aware Visual Storytelling with Visual Prefix Tuning and
  Contrastive Learning
Context-aware Visual Storytelling with Visual Prefix Tuning and Contrastive Learning
Yingjin Song
Denis Paperno
Albert Gatt
66
0
0
12 Aug 2024
Speech vs. Transcript: Does It Matter for Human Annotators in Speech
  Summarization?
Speech vs. Transcript: Does It Matter for Human Annotators in Speech Summarization?
Roshan S. Sharma
Suwon Shon
Mark Lindsey
Hira Dhamyal
Rita Singh
Bhiksha Raj
103
1
0
12 Aug 2024
Hyperbolic Learning with Multimodal Large Language Models
Hyperbolic Learning with Multimodal Large Language Models
Paolo Mandica
Luca Franco
Konstantinos Kallidromitis
Suzanne Petryk
Fabio Galasso
85
3
0
09 Aug 2024
Surveying the Landscape of Image Captioning Evaluation: A Comprehensive Taxonomy, Trends and Metrics Analysis
Surveying the Landscape of Image Captioning Evaluation: A Comprehensive Taxonomy, Trends and Metrics Analysis
Uri Berger
Gabriel Stanovsky
Omri Abend
Lea Frermann
71
0
0
09 Aug 2024
Previous
123...567...424344
Next