Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2303.15389
Cited By
EVA-CLIP: Improved Training Techniques for CLIP at Scale
27 March 2023
Quan-Sen Sun
Yuxin Fang
Ledell Yu Wu
Xinlong Wang
Yue Cao
CLIP
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"EVA-CLIP: Improved Training Techniques for CLIP at Scale"
50 / 360 papers shown
Title
Zero-shot Action Localization via the Confidence of Large Vision-Language Models
Josiah Aklilu
Xiaohan Wang
Serena Yeung-Levy
62
1
0
18 Oct 2024
Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation
Chengyue Wu
Xiaokang Chen
Z. F. Wu
Yiyang Ma
Xingchao Liu
...
Wen Liu
Zhenda Xie
Xingkai Yu
Chong Ruan
Ping Luo
AI4TS
60
77
0
17 Oct 2024
MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding
Yue Cao
Yangzhou Liu
Zhe Chen
Guangchen Shi
Wenhai Wang
Danhuai Zhao
Tong Lu
51
5
0
15 Oct 2024
It's Just Another Day: Unique Video Captioning by Discriminative Prompting
Toby Perrett
Tengda Han
Dima Damen
Andrew Zisserman
19
3
0
15 Oct 2024
Browsing without Third-Party Cookies: What Do You See?
Maxwell Lin
Shihan Lin
Helen Wu
Karen Wang
Xiaowei Yang
BDL
56
8
0
14 Oct 2024
LG-CAV: Train Any Concept Activation Vector with Language Guidance
Qihan Huang
Jie Song
Mengqi Xue
H. Zhang
Bingde Hu
Huiqiong Wang
Hao Jiang
Xingen Wang
Xiuming Zhang
VLM
34
0
0
14 Oct 2024
big.LITTLE Vision Transformer for Efficient Visual Recognition
He Guo
Yulong Wang
Zixuan Ye
Jifeng Dai
Yuwen Xiong
ViT
52
0
0
14 Oct 2024
TULIP: Token-length Upgraded CLIP
Ivona Najdenkoska
Mohammad Mahdi Derakhshani
Yuki M. Asano
Nanne van Noord
Marcel Worring
Cees G. M. Snoek
VLM
48
3
0
13 Oct 2024
Foundation Model-Powered 3D Few-Shot Class Incremental Learning via Training-free Adaptor
Sahar Ahmadi
A. Cheraghian
Morteza Saberi
Md. Towsif Abir
Hamidreza Dastmalchi
Farookh Hussain
Shafin Rahman
3DPC
28
2
0
11 Oct 2024
MiRAGeNews: Multimodal Realistic AI-Generated News Detection
Runsheng Huang
Liam Dugan
Yuqing Yang
Chris Callison-Burch
44
3
0
11 Oct 2024
Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate
Qidong Huang
Xiaoyi Dong
Pan Zhang
Yuhang Zang
Yuhang Cao
Jiaqi Wang
Dahua Lin
Weiming Zhang
Nenghai Yu
57
5
0
09 Oct 2024
Temporal Image Caption Retrieval Competition -- Description and Results
Jakub Pokrywka
Piotr Wierzchoñ
Kornel Weryszko
Krzysztof Jassem
52
0
0
08 Oct 2024
Training-Free Open-Ended Object Detection and Segmentation via Attention as Prompts
Zhiwei Lin
Yongtao Wang
Zhi Tang
ObjD
VLM
30
2
0
08 Oct 2024
Enhancing Temporal Modeling of Video LLMs via Time Gating
Zi-Yuan Hu
Yiwu Zhong
Shijia Huang
M. Lyu
Liwei Wang
VLM
33
0
0
08 Oct 2024
TRACE: Temporal Grounding Video LLM via Causal Event Modeling
Yongxin Guo
Jingyu Liu
Mingda Li
Xiaoying Tang
Qingbin Liu
Xiaoying Tang
39
14
0
08 Oct 2024
DAMRO: Dive into the Attention Mechanism of LVLM to Reduce Object Hallucination
Xuan Gong
Tianshi Ming
Xinpeng Wang
Zhihua Wei
MLLM
42
10
0
06 Oct 2024
Investigating and Mitigating Object Hallucinations in Pretrained Vision-Language (CLIP) Models
Yufang Liu
Tao Ji
Changzhi Sun
Yuanbin Wu
Aimin Zhou
VLM
MLLM
43
2
0
04 Oct 2024
Toward a Holistic Evaluation of Robustness in CLIP Models
Weijie Tu
Weijian Deng
Tom Gedeon
VLM
41
5
0
02 Oct 2024
Solution for OOD-CV Workshop SSB Challenge 2024 (Open-Set Recognition Track)
Mingxu Feng
Dian Chao
Peng Zheng
Yang Yang
26
0
0
30 Sep 2024
From Seconds to Hours: Reviewing MultiModal Large Language Models on Comprehensive Long Video Understanding
Heqing Zou
Tianze Luo
Guiyang Xie
Victor
Zhang
...
Guangcong Wang
Juanyang Chen
Zhuochen Wang
Hansheng Zhang
Huaijian Zhang
VLM
34
6
0
27 Sep 2024
MIO: A Foundation Model on Multimodal Tokens
Zekun Wang
King Zhu
Chunpu Xu
Wangchunshu Zhou
Jiaheng Liu
...
Yuanxing Zhang
Ge Zhang
Ke Xu
Jie Fu
Wenhao Huang
MLLM
AuLLM
60
11
0
26 Sep 2024
Multi-View and Multi-Scale Alignment for Contrastive Language-Image Pre-training in Mammography
Yuexi Du
John Onofrey
Nicha Dvornek
VLM
53
1
0
26 Sep 2024
VL4AD: Vision-Language Models Improve Pixel-wise Anomaly Detection
Liangyu Zhong
Joachim Sicking
Fabian Hüger
Hanno Gottschalk
VLM
35
0
0
25 Sep 2024
MM-CamObj: A Comprehensive Multimodal Dataset for Camouflaged Object Scenarios
Jiacheng Ruan
Wenzhen Yuan
Zehao Lin
Ning Liao
Zhiyu Li
Zhiyu Li
Ting Liu
Yuzhuo Fu
51
5
0
24 Sep 2024
M
2
^2
2
PT: Multimodal Prompt Tuning for Zero-shot Instruction Learning
Taowen Wang
Yiyang Liu
James Liang
Junhan Zhao
Yiming Cui
...
Zenglin Xu
Cheng Han
Lifu Huang
Qifan Wang
Dongfang Liu
MLLM
VLM
LRM
27
16
0
24 Sep 2024
Efficient and Discriminative Image Feature Extraction for Universal Image Retrieval
Morris Florek
David Tschirschwitz
Björn Barz
Volker Rodehorst
VLM
33
0
0
20 Sep 2024
Understanding Multimodal Hallucination with Parameter-Free Representation Alignment
Yueqian Wang
Jianxin Liang
Yuxuan Wang
Huishuai Zhang
Dongyan Zhao
46
1
0
02 Sep 2024
CogVLM2: Visual Language Models for Image and Video Understanding
Wenyi Hong
Weihan Wang
Ming Ding
Wenmeng Yu
Qingsong Lv
...
Debing Liu
Bin Xu
Juanzi Li
Yuxiao Dong
Jie Tang
VLM
MLLM
50
88
0
29 Aug 2024
More Text, Less Point: Towards 3D Data-Efficient Point-Language Understanding
Yuan Tang
Xu Han
Xianzhi Li
Qiao Yu
Jinfeng Xu
Yixue Hao
Long Hu
Min Chen
43
1
0
28 Aug 2024
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
Min Shi
Fuxiao Liu
Shihao Wang
Shijia Liao
Subhashree Radhakrishnan
...
Andrew Tao
Andrew Tao
Zhiding Yu
Guilin Liu
Guilin Liu
MLLM
30
53
0
28 Aug 2024
ParGo: Bridging Vision-Language with Partial and Global Views
An-Lan Wang
Bin Shan
Wei Shi
Kun-Yu Lin
Xiang Fei
Guozhi Tang
Lei Liao
Jingqun Tang
Can Huang
Wei-Shi Zheng
MLLM
VLM
90
14
0
23 Aug 2024
Building and better understanding vision-language models: insights and future directions
Hugo Laurençon
Andrés Marafioti
Victor Sanh
Léo Tronchon
VLM
42
61
0
22 Aug 2024
MaVEn: An Effective Multi-granularity Hybrid Visual Encoding Framework for Multimodal Large Language Model
Chaoya Jiang
Jia Hongrui
Haiyang Xu
Wei Ye
Mengfan Dong
Ming Yan
Ji Zhang
Fei Huang
Shikun Zhang
VLM
48
1
0
22 Aug 2024
EMO-LLaMA: Enhancing Facial Emotion Understanding with Instruction Tuning
Bohao Xing
Zitong Yu
Xin Liu
Kaishen Yuan
Qilang Ye
Weicheng Xie
Huanjing Yue
Jingyu Yang
Heikki Kälviäinen
50
10
0
21 Aug 2024
TDS-CLIP: Temporal Difference Side Network for Image-to-Video Transfer Learning
Bin Wang
Wenqian Wang
VLM
37
1
0
20 Aug 2024
CROME: Cross-Modal Adapters for Efficient Multimodal LLM
Sayna Ebrahimi
Sercan Ö. Arik
Tejas Nama
Tomas Pfister
44
1
0
13 Aug 2024
Efficient Test-Time Prompt Tuning for Vision-Language Models
Yuhan Zhu
Guozhen Zhang
Chen Xu
Haocheng Shen
Xiaoxin Chen
Gangshan Wu
Limin Wang
VLM
39
2
0
11 Aug 2024
VideoQA in the Era of LLMs: An Empirical Study
Junbin Xiao
Nanxin Huang
Hangyu Qin
Dongyang Li
Yicong Li
...
Zhulin Tao
Jianxing Yu
Liang Lin
Tat-Seng Chua
Angela Yao
28
10
0
08 Aug 2024
SynesLM: A Unified Approach for Audio-visual Speech Recognition and Translation via Language Model and Synthetic Data
Yichen Lu
Álvaro Huertas-García
Xuankai Chang
Hengwei Bian
Soumi Maiti
Shinji Watanabe
46
2
0
01 Aug 2024
EZSR: Event-based Zero-Shot Recognition
Yan Yang
Sehwan Kim
Dongxu Li
Y. Sun
31
0
0
31 Jul 2024
CLEFT: Language-Image Contrastive Learning with Efficient Large Language Model and Prompt Fine-Tuning
Yuexi Du
Brian Chang
Nicha Dvornek
MedIm
VLM
48
2
0
30 Jul 2024
GABInsight: Exploring Gender-Activity Binding Bias in Vision-Language Models
Ali Abdollahi
Mahdi Ghaznavi
Mohammad Reza Karimi Nejad
Arash Mari Oriyad
Reza Abbasi
Ali Salesi
Melika Behjati
M. Rohban
M. Baghshah
CoGe
34
1
0
30 Jul 2024
OmniBal: Towards Fast Instruct-tuning for Vision-Language Models via Omniverse Computation Balance
Yongqiang Yao
Jingru Tan
Jiahao Hu
Feizhao Zhang
Xin Jin
...
Ruihao Gong
Pengfei Liu
Pengfei Liu
Dahua Lin
Ningyi Xu
VLM
52
1
0
30 Jul 2024
Diffusion Feedback Helps CLIP See Better
Wenxuan Wang
Quan-Sen Sun
Fan Zhang
Yepeng Tang
Jing Liu
Xinlong Wang
VLM
46
14
0
29 Jul 2024
ActivityCLIP: Enhancing Group Activity Recognition by Mining Complementary Information from Text to Supplement Image Modality
Guoliang Xu
Jianqin Yin
Feng Zhou
Yonghao Dang
VLM
41
0
0
29 Jul 2024
Exploring the Adversarial Robustness of CLIP for AI-generated Image Detection
Vincenzo De Rosa
Fabrizio Guillaro
Giovanni Poggi
D. Cozzolino
L. Verdoliva
AAML
65
5
0
28 Jul 2024
MMCLIP: Cross-modal Attention Masked Modelling for Medical Language-Image Pre-Training
Biao Wu
Yutong Xie
Zeyu Zhang
Minh Hieu Phan
Qi Chen
Ling-Hao Chen
Qi Wu
LM&MA
41
0
0
28 Jul 2024
UOUO: Uncontextualized Uncommon Objects for Measuring Knowledge Horizons of Vision Language Models
Xinyu Pi
Mingyuan Wu
Jize Jiang
Haozhen Zheng
Beitong Tian
Chengxiang Zhai
Klara Nahrstedt
Zhiting Hu
VLM
41
1
0
25 Jul 2024
Unified Lexical Representation for Interpretable Visual-Language Alignment
Yifan Li
Yikai Wang
Yanwei Fu
Dongyu Ru
Zheng-Wei Zhang
Tong He
VLM
42
4
0
25 Jul 2024
MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity
Yangzhou Liu
Yue Cao
Zhangwei Gao
Weiyun Wang
Zhe Chen
...
Lewei Lu
Xizhou Zhu
Tong Lu
Yu Qiao
Jifeng Dai
VLM
MLLM
57
23
0
22 Jul 2024
Previous
1
2
3
4
5
6
7
8
Next