ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1411.5726
  4. Cited By
CIDEr: Consensus-based Image Description Evaluation

CIDEr: Consensus-based Image Description Evaluation

20 November 2014
Ramakrishna Vedantam
C. L. Zitnick
Devi Parikh
ArXivPDFHTML

Papers citing "CIDEr: Consensus-based Image Description Evaluation"

50 / 2,136 papers shown
Title
Video DataFlywheel: Resolving the Impossible Data Trinity in
  Video-Language Understanding
Video DataFlywheel: Resolving the Impossible Data Trinity in Video-Language Understanding
Xiao Wang
Jianlong Wu
Zijia Lin
Fuzheng Zhang
Di Zhang
Liqiang Nie
VGen
37
1
0
29 Sep 2024
DENEB: A Hallucination-Robust Automatic Evaluation Metric for Image
  Captioning
DENEB: A Hallucination-Robust Automatic Evaluation Metric for Image Captioning
Kazuki Matsuda
Yuiga Wada
Komei Sugiura
31
1
0
28 Sep 2024
TrojVLM: Backdoor Attack Against Vision Language Models
TrojVLM: Backdoor Attack Against Vision Language Models
Weimin Lyu
Lu Pang
Tengfei Ma
Haibin Ling
Chao Chen
MLLM
37
7
0
28 Sep 2024
Evaluation of Large Language Models for Summarization Tasks in the
  Medical Domain: A Narrative Review
Evaluation of Large Language Models for Summarization Tasks in the Medical Domain: A Narrative Review
Emma Croxford
Yanjun Gao
Nicholas Pellegrino
Karen K. Wong
Graham Wills
Elliot First
Frank J. Liao
Cherodeep Goswami
Brian Patterson
Majid Afshar
HILM
ELM
LM&MA
37
1
0
26 Sep 2024
E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding
E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding
Ye Liu
Zongyang Ma
Zhongang Qi
Yang Wu
Ying Shan
Chang Wen Chen
39
16
0
26 Sep 2024
Inferring Alt-text For UI Icons With Large Language Models During App
  Development
Inferring Alt-text For UI Icons With Large Language Models During App Development
Sabrina Haque
Christoph Csallner
VLM
36
0
0
26 Sep 2024
IFCap: Image-like Retrieval and Frequency-based Entity Filtering for
  Zero-shot Captioning
IFCap: Image-like Retrieval and Frequency-based Entity Filtering for Zero-shot Captioning
Soeun Lee
Si-Woo Kim
Taewhan Kim
Dong-Jin Kim
CLIP
VLM
31
0
0
26 Sep 2024
MIO: A Foundation Model on Multimodal Tokens
MIO: A Foundation Model on Multimodal Tokens
Zekun Wang
King Zhu
Chunpu Xu
Wangchunshu Zhou
Jiaheng Liu
...
Yuanxing Zhang
Ge Zhang
Ke Xu
Jie Fu
Wenhao Huang
MLLM
AuLLM
60
11
0
26 Sep 2024
Domain-Independent Automatic Generation of Descriptive Texts for
  Time-Series Data
Domain-Independent Automatic Generation of Descriptive Texts for Time-Series Data
Kota Dohi
Aoi Ito
Harsh Purohit
Tomoya Nishida
Takashi Endo
Y. Kawaguchi
32
3
0
25 Sep 2024
Overview of the First Shared Task on Clinical Text Generation: RRG24 and
  "Discharge Me!"
Overview of the First Shared Task on Clinical Text Generation: RRG24 and "Discharge Me!"
Justin Xu
Zhihong Chen
Andrew Johnston
Louis Blankemeier
Maya Varma
...
Ankit Modi
Robert Lloyd
Benjamin Hopkins
Curtis Langlotz
Jean-Benoit Delbrouck
LM&MA
44
25
0
25 Sep 2024
A-VL: Adaptive Attention for Large Vision-Language Models
A-VL: Adaptive Attention for Large Vision-Language Models
Junyang Zhang
Mu Yuan
Ruiguang Zhong
Puhan Luo
Huiyou Zhan
Ningkang Zhang
Chengchen Hu
Xiangyang Li
VLM
43
1
0
23 Sep 2024
@Bench: Benchmarking Vision-Language Models for Human-centered Assistive
  Technology
@Bench: Benchmarking Vision-Language Models for Human-centered Assistive Technology
Xin Jiang
Junwei Zheng
Ruiping Liu
Jiahang Li
Jiaming Zhang
Sven Matthiesen
Rainer Stiefelhagen
VLM
28
0
0
21 Sep 2024
Enhancing Advanced Visual Reasoning Ability of Large Language Models
Enhancing Advanced Visual Reasoning Ability of Large Language Models
Zhiyuan Li
Dongnan Liu
Chaoyi Zhang
Heng Wang
Tengfei Xue
Weidong Cai
VLM
LRM
57
6
0
21 Sep 2024
HUT: A More Computation Efficient Fine-Tuning Method With Hadamard
  Updated Transformation
HUT: A More Computation Efficient Fine-Tuning Method With Hadamard Updated Transformation
Geyuan Zhang
Xiaofei Zhou
Chuheng Chen
29
0
0
20 Sep 2024
CLAIR-A: Leveraging Large Language Models to Judge Audio Captions
CLAIR-A: Leveraging Large Language Models to Judge Audio Captions
Tsung-Han Wu
Joseph E. Gonzalez
Trevor Darrell
David M. Chan
22
2
0
19 Sep 2024
Evaluating Image Hallucination in Text-to-Image Generation with
  Question-Answering
Evaluating Image Hallucination in Text-to-Image Generation with Question-Answering
Youngsun Lim
Hojun Choi
Hyunjung Shim
HILM
EGVM
MLLM
44
0
0
19 Sep 2024
Enhancing Perception of Key Changes in Remote Sensing Image Change
  Captioning
Enhancing Perception of Key Changes in Remote Sensing Image Change Captioning
Cong Yang
Zuchao Li
Hongzan Jiao
Zhi Gao
Lefei Zhang
37
1
0
19 Sep 2024
KALE: An Artwork Image Captioning System Augmented with Heterogeneous
  Graph
KALE: An Artwork Image Captioning System Augmented with Heterogeneous Graph
Yanbei Jiang
Krista A. Ehinger
Jey Han Lau
SLR
33
0
0
17 Sep 2024
Playground v3: Improving Text-to-Image Alignment with Deep-Fusion Large
  Language Models
Playground v3: Improving Text-to-Image Alignment with Deep-Fusion Large Language Models
Bingchen Liu
Ehsan Akhgari
Alexander Visheratin
Aleks Kamko
Linmiao Xu
Shivam Shrirao
Joao Souza
Suhail Doshi
Daiqing Li
Daiqing Li
DiffM
MLLM
31
47
0
16 Sep 2024
Video Token Sparsification for Efficient Multimodal LLMs in Autonomous
  Driving
Video Token Sparsification for Efficient Multimodal LLMs in Autonomous Driving
Yunsheng Ma
Amr Abdelraouf
Rohit Gupta
Ziran Wang
Kyungtae Han
31
3
0
16 Sep 2024
Towards Diverse and Efficient Audio Captioning via Diffusion Models
Towards Diverse and Efficient Audio Captioning via Diffusion Models
Manjie Xu
Chenxing Li
Xinyi Tu
Yong Ren
Ruibo Fu
Wei Liang
Dong Yu
DiffM
49
1
0
14 Sep 2024
ChangeChat: An Interactive Model for Remote Sensing Change Analysis via
  Multimodal Instruction Tuning
ChangeChat: An Interactive Model for Remote Sensing Change Analysis via Multimodal Instruction Tuning
Pei Deng
Wenqian Zhou
Hanlin Wu
26
2
0
13 Sep 2024
Securing Vision-Language Models with a Robust Encoder Against Jailbreak
  and Adversarial Attacks
Securing Vision-Language Models with a Robust Encoder Against Jailbreak and Adversarial Attacks
Md Zarif Hossain
Ahmed Imteaj
AAML
VLM
48
3
0
11 Sep 2024
PiTe: Pixel-Temporal Alignment for Large Video-Language Model
PiTe: Pixel-Temporal Alignment for Large Video-Language Model
Yang Liu
Pengxiang Ding
Siteng Huang
Min Zhang
Han Zhao
Donglin Wang
40
7
0
11 Sep 2024
Hint-AD: Holistically Aligned Interpretability in End-to-End Autonomous
  Driving
Hint-AD: Holistically Aligned Interpretability in End-to-End Autonomous Driving
Kairui Ding
Boyuan Chen
Yuchen Su
Huan-ang Gao
Bu Jin
...
Wuqiang Zhang
Xiaohui Li
Paul Barsch
Hongyang Li
Hao Zhao
58
3
0
10 Sep 2024
Enhancing Long Video Understanding via Hierarchical Event-Based Memory
Enhancing Long Video Understanding via Hierarchical Event-Based Memory
Dingxin Cheng
Mingda Li
Jingyu Liu
Yongxin Guo
Bin Jiang
Qingbin Liu
Xi Chen
Bo Zhao
38
4
0
10 Sep 2024
Spatially-Aware Speaker for Vision-and-Language Navigation Instruction
  Generation
Spatially-Aware Speaker for Vision-and-Language Navigation Instruction Generation
Muraleekrishna Gopinathan
Martin Masek
Jumana Abu-Khalaf
David Suter
LM&Ro
31
1
0
09 Sep 2024
KARGEN: Knowledge-enhanced Automated Radiology Report Generation Using
  Large Language Models
KARGEN: Knowledge-enhanced Automated Radiology Report Generation Using Large Language Models
Yingshu Li
Zhanyu Wang
Yunyi Liu
Lei Wang
Lingqiao Liu
Luping Zhou
33
3
0
09 Sep 2024
FODA-PG for Enhanced Medical Imaging Narrative Generation: Adaptive
  Differentiation of Normal and Abnormal Attributes
FODA-PG for Enhanced Medical Imaging Narrative Generation: Adaptive Differentiation of Normal and Abnormal Attributes
Kai Shu
Yuzhuo Jia
Ziyang Zhang
Jiechao Gao
MedIm
32
0
0
06 Sep 2024
Question-Answering Dense Video Events
Question-Answering Dense Video Events
Hangyu Qin
Junbin Xiao
Angela Yao
VLM
77
1
0
06 Sep 2024
Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding
Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding
Yunze Man
Shuhong Zheng
Zhipeng Bao
M. Hebert
Liang-Yan Gui
Yu-xiong Wang
75
15
0
05 Sep 2024
No Detail Left Behind: Revisiting Self-Retrieval for Fine-Grained Image Captioning
No Detail Left Behind: Revisiting Self-Retrieval for Fine-Grained Image Captioning
Manu Gaur
Darshan Singh
Makarand Tapaswi
130
1
0
04 Sep 2024
Retrieval-Augmented Natural Language Reasoning for Explainable Visual
  Question Answering
Retrieval-Augmented Natural Language Reasoning for Explainable Visual Question Answering
Su Hyeon Lim
Minkuk Kim
Hyeon Bae Kim
Seong Tae Kim
ReLM
LRM
45
0
0
30 Aug 2024
Medical Report Generation Is A Multi-label Classification Problem
Medical Report Generation Is A Multi-label Classification Problem
Yijian Fan
Zhenbang Yang
Rui Liu
Mingjie Li
Xiaojun Chang
MedIm
35
1
0
30 Aug 2024
LLaVA-Chef: A Multi-modal Generative Model for Food Recipes
LLaVA-Chef: A Multi-modal Generative Model for Food Recipes
Fnu Mohbat
Mohammed J. Zaki
32
7
0
29 Aug 2024
See or Guess: Counterfactually Regularized Image Captioning
See or Guess: Counterfactually Regularized Image Captioning
Qian Cao
Xu Chen
Ruihua Song
Xiting Wang
Xinting Huang
Yuchen Ren
CML
33
1
0
29 Aug 2024
Pixels to Prose: Understanding the art of Image Captioning
Pixels to Prose: Understanding the art of Image Captioning
Hrishikesh Singh
Aarti Sharma
Millie Pant
3DV
VLM
25
0
0
28 Aug 2024
AeroVerse: UAV-Agent Benchmark Suite for Simulating, Pre-training,
  Finetuning, and Evaluating Aerospace Embodied World Models
AeroVerse: UAV-Agent Benchmark Suite for Simulating, Pre-training, Finetuning, and Evaluating Aerospace Embodied World Models
Fanglong Yao
Yuanchang Yue
Youzhi Liu
Xian Sun
Kun Fu
VGen
EgoV
29
6
0
28 Aug 2024
Fine-grained length controllable video captioning with ordinal
  embeddings
Fine-grained length controllable video captioning with ordinal embeddings
Tomoya Nitta
Takumi Fukuzawa
Toru Tamaki
45
0
0
27 Aug 2024
What Makes a Good Story and How Can We Measure It? A Comprehensive
  Survey of Story Evaluation
What Makes a Good Story and How Can We Measure It? A Comprehensive Survey of Story Evaluation
Dingyi Yang
Qin Jin
44
5
0
26 Aug 2024
Revisiting Image Captioning Training Paradigm via Direct CLIP-based
  Optimization
Revisiting Image Captioning Training Paradigm via Direct CLIP-based Optimization
Nicholas Moratelli
Davide Caffagni
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
CLIP
31
3
0
26 Aug 2024
Semantic Alignment for Multimodal Large Language Models
Semantic Alignment for Multimodal Large Language Models
Tao Wu
Mengze Li
Jingyuan Chen
Wei Ji
Wang Lin
Jinyang Gao
Kun Kuang
Zhou Zhao
Fei Wu
43
4
0
23 Aug 2024
TRRG: Towards Truthful Radiology Report Generation With Cross-modal
  Disease Clue Enhanced Large Language Model
TRRG: Towards Truthful Radiology Report Generation With Cross-modal Disease Clue Enhanced Large Language Model
Yuhao Wang
Chao Hao
Yawen Cui
Xinqi Su
Weicheng Xie
Tao Tan
Zitong Yu
LM&MA
MedIm
33
0
0
22 Aug 2024
Transfusion: Predict the Next Token and Diffuse Images with One
  Multi-Modal Model
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
Chunting Zhou
Lili Yu
Arun Babu
Kushal Tirumala
Michihiro Yasunaga
Leonid Shamis
Jacob Kahn
Xuezhe Ma
Luke Zettlemoyer
Omer Levy
DiffM
42
147
0
20 Aug 2024
MegaFusion: Extend Diffusion Models towards Higher-resolution Image
  Generation without Further Tuning
MegaFusion: Extend Diffusion Models towards Higher-resolution Image Generation without Further Tuning
Haoning Wu
Shaocheng Shen
Qiang Hu
Xiaoyun Zhang
Ya Zhang
Yanfeng Wang
40
10
0
20 Aug 2024
R2GenCSR: Retrieving Context Samples for Large Language Model based
  X-ray Medical Report Generation
R2GenCSR: Retrieving Context Samples for Large Language Model based X-ray Medical Report Generation
Xiao Wang
Yuehang Li
Fuling Wang
Shiao Wang
Chuanfu Li
Bo Jiang
MedIm
44
6
0
19 Aug 2024
Quality Assessment in the Era of Large Models: A Survey
Quality Assessment in the Era of Large Models: A Survey
Zicheng Zhang
Yingjie Zhou
Chunyi Li
Baixuan Zhao
Xiaohong Liu
Guangtao Zhai
47
10
0
17 Aug 2024
Automatic Metrics in Natural Language Generation: A Survey of Current
  Evaluation Practices
Automatic Metrics in Natural Language Generation: A Survey of Current Evaluation Practices
Patrícia Schmidtová
Saad Mahamood
Simone Balloccu
Ondřej Dušek
Albert Gatt
Dimitra Gkatzia
David M. Howcroft
Ondřej Plátek
Adarsa Sivaprasad
45
3
0
17 Aug 2024
DIVE: Towards Descriptive and Diverse Visual Commonsense Generation
DIVE: Towards Descriptive and Diverse Visual Commonsense Generation
Jun-Hyung Park
Hyuntae Park
Youjin Kang
Eojin Jeon
SangKeun Lee
32
0
0
15 Aug 2024
See It All: Contextualized Late Aggregation for 3D Dense Captioning
See It All: Contextualized Late Aggregation for 3D Dense Captioning
Minjung Kim
Hyung Suk Lim
Seung Hwan Kim
Soonyoung Lee
Bumsoo Kim
Gunhee Kim
55
4
0
14 Aug 2024
Previous
123456...414243
Next