Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2502.11300
Cited By
CORDIAL: Can Multimodal Large Language Models Effectively Understand Coherence Relationships?
16 February 2025
Aashish Anantha Ramakrishnan
Aadarsh Anantha Ramakrishnan
Dongwon Lee
Re-assign community
ArXiv
PDF
HTML
Papers citing
"CORDIAL: Can Multimodal Large Language Models Effectively Understand Coherence Relationships?"
11 / 11 papers shown
Title
IRONIC: Coherence-Aware Reasoning Chains for Multi-Modal Sarcasm Detection
Aashish Anantha Ramakrishnan
Aadarsh Anantha Ramakrishnan
Dongwon Lee
LRM
19
0
0
22 May 2025
RONA: Pragmatically Diverse Image Captioning with Coherence Relations
Aashish Anantha Ramakrishnan
Aadarsh Anantha Ramakrishnan
Dongwon Lee
81
0
0
14 Mar 2025
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
Z. F. Wu
Xiaokang Chen
Zizheng Pan
Xianglong Liu
Wen Liu
...
Xingkai Yu
Haowei Zhang
Liang Zhao
Yijiao Wang
Chong Ruan
MLLM
VLM
MoE
132
122
0
13 Dec 2024
Winoground: Probing Vision and Language Models for Visio-Linguistic Compositionality
Tristan Thrush
Ryan Jiang
Max Bartolo
Amanpreet Singh
Adina Williams
Douwe Kiela
Candace Ross
CoGe
82
415
0
07 Apr 2022
LoRA: Low-Rank Adaptation of Large Language Models
J. E. Hu
Yelong Shen
Phillip Wallis
Zeyuan Allen-Zhu
Yuanzhi Li
Shean Wang
Lu Wang
Weizhu Chen
OffRL
AI4TS
AI4CE
ALM
AIMat
235
10,099
0
17 Jun 2021
CLIPScore: A Reference-free Evaluation Metric for Image Captioning
Jack Hessel
Ari Holtzman
Maxwell Forbes
Ronan Le Bras
Yejin Choi
CLIP
82
1,512
0
18 Apr 2021
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
681
28,659
0
26 Feb 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
392
3,778
0
11 Feb 2021
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
400
40,217
0
22 Oct 2020
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
500
41,106
0
28 May 2020
CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning
Justin Johnson
B. Hariharan
Laurens van der Maaten
Li Fei-Fei
C. L. Zitnick
Ross B. Girshick
CoGe
275
2,346
0
20 Dec 2016
1