Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1411.5726
Cited By
v1
v2 (latest)
CIDEr: Consensus-based Image Description Evaluation
20 November 2014
Ramakrishna Vedantam
C. L. Zitnick
Devi Parikh
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"CIDEr: Consensus-based Image Description Evaluation"
50 / 2,183 papers shown
Title
MACE: Leveraging Audio for Evaluating Audio Captioning Systems
Satvik Dixit
Soham Deshmukh
Bhiksha Raj
66
1
0
01 Nov 2024
Generative Emotion Cause Explanation in Multimodal Conversations
Lin Wang
Xiaocui Yang
Shi Feng
Daling Wang
Yifei Zhang
Zhitao Zhang
107
0
0
01 Nov 2024
Aggregate-and-Adapt Natural Language Prompts for Downstream Generalization of CLIP
Chen Huang
Skyler Seto
Samira Abnar
David Grangier
Navdeep Jaitly
J. Susskind
VLM
77
1
0
31 Oct 2024
Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving
Bo Jiang
Shaoyu Chen
Bencheng Liao
Xingyu Zhang
Wei Yin
Qian Zhang
Chang Huang
Wen Liu
Xinyu Wang
VLM
MLLM
LRM
113
31
0
29 Oct 2024
Preserving Pre-trained Representation Space: On Effectiveness of Prefix-tuning for Large Multi-modal Models
Donghoon Kim
Gusang Lee
Kyuhong Shim
B. Shim
97
1
0
29 Oct 2024
MotionGPT-2: A General-Purpose Motion-Language Model for Motion Generation and Understanding
Yuan Wang
Di Huang
Yaqi Zhang
Wanli Ouyang
J. Jiao
Xuetao Feng
Yan Zhou
Pengfei Wan
Shixiang Tang
Dan Xu
VGen
113
16
0
29 Oct 2024
What Factors Affect Multi-Modal In-Context Learning? An In-Depth Exploration
L. Qin
Qiguang Chen
Hao Fei
Zhi Chen
Min Li
Wanxiang Che
88
11
0
27 Oct 2024
Sensor2Text: Enabling Natural Language Interactions for Daily Activity Tracking Using Wearable Sensors
Wenqiang Chen
Jiaxuan Cheng
Leyao Wang
Wei Zhao
Wojciech Matusik
126
2
0
26 Oct 2024
AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models
Kim Sung-Bin
Oh Hyun-Bin
JungMok Lee
Arda Senocak
Joon Son Chung
Tae-Hyun Oh
MLLM
VLM
155
8
0
23 Oct 2024
Image-aware Evaluation of Generated Medical Reports
Gefen Dawidowicz
Elad Hirsch
A. Tal
64
1
0
22 Oct 2024
EVC-MF: End-to-end Video Captioning Network with Multi-scale Features
Tian-Zi Niu
Zhen-Duo Chen
Xin Luo
Xin-Shun Xu
49
0
0
22 Oct 2024
MotionGlot: A Multi-Embodied Motion Generation Model
Sudarshan Harithas
Srinath Sridhar
174
2
0
22 Oct 2024
EVA: An Embodied World Model for Future Video Anticipation
Xiaowei Chi
Hengyuan Zhang
Chun-Kai Fan
Xingqun Qi
Rongyu Zhang
...
Chi-Min Chan
Wei Xue
Wenhan Luo
Shanghang Zhang
Yike Guo
VGen
91
8
0
20 Oct 2024
Budgeted Online Continual Learning by Adaptive Layer Freezing and Frequency-based Sampling
Minhyuk Seo
Hyunseo Koh
Jonghyun Choi
102
3
0
19 Oct 2024
ActionCOMET: A Zero-shot Approach to Learn Image-specific Commonsense Concepts about Actions
Shailaja Keyur Sampat
Yezhou Yang
Chitta Baral
LM&Ro
83
0
0
17 Oct 2024
EmotionCaps: Enhancing Audio Captioning Through Emotion-Augmented Data Generation
Mithun Manivannan
Vignesh Nethrapalli
Mark Cartwright
62
1
0
15 Oct 2024
Efficient and Effective Universal Adversarial Attack against Vision-Language Pre-training Models
Fan Yang
Yihao Huang
Kaidi Wang
Ling Shi
G. Pu
Yang Liu
Haoran Wang
AAML
VLM
80
2
0
15 Oct 2024
When Does Perceptual Alignment Benefit Vision Representations?
Shobhita Sundaram
Stephanie Fu
Lukas Muttenthaler
Netanel Y. Tamir
Lucy Chai
Simon Kornblith
Trevor Darrell
Phillip Isola
113
22
1
14 Oct 2024
Enhancing Robustness in Deep Reinforcement Learning: A Lyapunov Exponent Approach
Rory Young
Nicolas Pugeault
AAML
136
5
0
14 Oct 2024
ChangeMinds: Multi-task Framework for Detecting and Describing Changes in Remote Sensing
Yuduo Wang
Weikang Yu
Michael K Kopp
Pedram Ghamisi
73
1
0
13 Oct 2024
ECIS-VQG: Generation of Entity-centric Information-seeking Questions from Videos
Arpan Phukan
Manish Gupta
Asif Ekbal
VGen
82
0
0
13 Oct 2024
BiDoRA: Bi-level Optimization-Based Weight-Decomposed Low-Rank Adaptation
Peijia Qin
Ruiyi Zhang
Pengtao Xie
62
2
0
13 Oct 2024
EmbodiedCity: A Benchmark Platform for Embodied Agent in Real-world City Environment
Chen Gao
Baining Zhao
Weichen Zhang
Jinzhu Mao
Jun Zhang
...
Jianjie Fang
Zile Zhou
Jinqiang Cui
Xinyu Chen
Yong Li
LM&Ro
102
15
0
12 Oct 2024
SLAM-AAC: Enhancing Audio Captioning with Paraphrasing Augmentation and CLAP-Refine through LLMs
Wenxi Chen
Ziyang Ma
Xiquan Li
Xuenan Xu
Yuzhe Liang
Zhisheng Zheng
Kai Yu
Xie Chen
100
7
0
12 Oct 2024
DRCap: Decoding CLAP Latents with Retrieval-Augmented Generation for Zero-shot Audio Captioning
Xiquan Li
Wenxi Chen
Ziyang Ma
Xuenan Xu
Yuzhe Liang
Zhisheng Zheng
Qiuqiang Kong
Xie Chen
VLM
121
6
0
12 Oct 2024
GEM-VPC: A dual Graph-Enhanced Multimodal integration for Video Paragraph Captioning
Eileen Wang
Caren Han
Josiah Poon
61
0
0
12 Oct 2024
Audio Description Generation in the Era of LLMs and VLMs: A Review of Transferable Generative AI Technologies
Yingqiang Gao
Lukas Fischer
Alexa Lintner
Sarah Ebling
69
1
0
11 Oct 2024
Positive-Augmented Contrastive Learning for Vision-and-Language Evaluation and Training
Sara Sarto
Nicholas Moratelli
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
80
4
0
09 Oct 2024
NaVIP: An Image-Centric Indoor Navigation Solution for Visually Impaired People
Jun Yu
Yifan Zhang
Badrinadh Aila
V. Namboodiri
106
1
0
08 Oct 2024
The Mystery of Compositional Generalization in Graph-based Generative Commonsense Reasoning
Xiyan Fu
Anette Frank
LRM
124
0
0
08 Oct 2024
An Eye for an Ear: Zero-shot Audio Description Leveraging an Image Captioner using Audiovisual Distribution Alignment
Hugo Malard
Michel Olvera
Stéphane Lathuilière
S. Essid
VLM
66
0
0
08 Oct 2024
TRACE: Temporal Grounding Video LLM via Causal Event Modeling
Yongxin Guo
Jingyu Liu
Mingda Li
Xiaoying Tang
Qingbin Liu
Xiaoying Tang
119
19
0
08 Oct 2024
R-Bench: Are your Large Multimodal Model Robust to Real-world Corruptions?
Chunyi Li
Junxuan Zhang
Zicheng Zhang
H. Wu
Yuan Tian
...
Guo Lu
Xiaohong Liu
Xiongkuo Min
Weisi Lin
Guangtao Zhai
AAML
95
4
0
07 Oct 2024
CoVLM: Leveraging Consensus from Vision-Language Models for Semi-supervised Multi-modal Fake News Detection
Devank
Jayateja Kalla
Soma Biswas
67
2
0
06 Oct 2024
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
Wenhao Chai
Enxin Song
Y. Du
Chenlin Meng
Vashisht Madhavan
Omer Bar-Tal
Jeng-Neng Hwang
Saining Xie
Christopher D. Manning
3DV
219
37
0
04 Oct 2024
Self-eXplainable AI for Medical Image Analysis: A Survey and New Outlooks
Junlin Hou
Sicen Liu
Yequan Bie
Hongmei Wang
Andong Tan
Luyang Luo
Hao Chen
XAI
118
5
0
03 Oct 2024
MetaMetrics: Calibrating Metrics For Generation Tasks Using Human Preferences
Genta Indra Winata
David Anugraha
Lucky Susanto
Garry Kuwanto
Derry Wijaya
169
11
0
03 Oct 2024
Revisiting Prefix-tuning: Statistical Benefits of Reparameterization among Prompts
Minh Le
Chau Nguyen
Huy Nguyen
Quyen Tran
Trung Le
Nhat Ho
135
8
0
03 Oct 2024
Backdooring Vision-Language Models with Out-Of-Distribution Data
Weimin Lyu
Jiachen Yao
Saumya Gupta
Lu Pang
Tao Sun
Lingjie Yi
Lijie Hu
Haibin Ling
Chao Chen
VLM
AAML
139
8
0
02 Oct 2024
CXPMRG-Bench: Pre-training and Benchmarking for X-ray Medical Report Generation on CheXpert Plus Dataset
Xiao Wang
Fuling Wang
Yuehang Li
Qingchuan Ma
Shiao Wang
Bo Jiang
Chuanfu Li
Jin Tang
119
4
0
01 Oct 2024
Decoding the Echoes of Vision from fMRI: Memory Disentangling for Past Semantic Information
Runze Xia
Congchi Yin
Piji Li
77
1
0
30 Sep 2024
TROPE: TRaining-Free Object-Part Enhancement for Seamlessly Improving Fine-Grained Zero-Shot Image Captioning
Joshua Forster Feinglass
Yezhou Yang
63
0
0
30 Sep 2024
See Detail Say Clear: Towards Brain CT Report Generation via Pathological Clue-driven Representation Learning
Chengxin Zheng
Junzhong Ji
Yanzhao Shi
Xiaodan Zhang
Liangqiong Qu
3DV
MedIm
66
3
0
29 Sep 2024
Video DataFlywheel: Resolving the Impossible Data Trinity in Video-Language Understanding
Xiao Wang
Jianlong Wu
Zijia Lin
Fuzheng Zhang
Di Zhang
Liqiang Nie
VGen
65
3
0
29 Sep 2024
DENEB: A Hallucination-Robust Automatic Evaluation Metric for Image Captioning
Kazuki Matsuda
Yuiga Wada
Komei Sugiura
61
1
0
28 Sep 2024
TrojVLM: Backdoor Attack Against Vision Language Models
Weimin Lyu
Lu Pang
Tengfei Ma
Haibin Ling
Chao Chen
MLLM
97
11
0
28 Sep 2024
Evaluation of Large Language Models for Summarization Tasks in the Medical Domain: A Narrative Review
Emma Croxford
Yanjun Gao
Nicholas Pellegrino
Karen K. Wong
Graham Wills
Elliot First
Frank J. Liao
Cherodeep Goswami
Brian Patterson
Majid Afshar
HILM
ELM
LM&MA
129
1
0
26 Sep 2024
E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding
Ye Liu
Zongyang Ma
Zhongang Qi
Yang Wu
Ying Shan
Chang Wen Chen
107
23
0
26 Sep 2024
Inferring Alt-text For UI Icons With Large Language Models During App Development
Sabrina Haque
Christoph Csallner
VLM
67
0
0
26 Sep 2024
IFCap: Image-like Retrieval and Frequency-based Entity Filtering for Zero-shot Captioning
Soeun Lee
Si-Woo Kim
Taewhan Kim
Dong-Jin Kim
CLIP
VLM
59
0
0
26 Sep 2024
Previous
1
2
3
4
5
6
...
42
43
44
Next