Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2211.07636
Cited By
EVA: Exploring the Limits of Masked Visual Representation Learning at Scale
14 November 2022
Yuxin Fang
Wen Wang
Binhui Xie
Quan-Sen Sun
Ledell Yu Wu
Xinggang Wang
Tiejun Huang
Xinlong Wang
Yue Cao
VLM
CLIP
Re-assign community
ArXiv
PDF
HTML
Papers citing
"EVA: Exploring the Limits of Masked Visual Representation Learning at Scale"
50 / 507 papers shown
Title
FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual Prompts
Yichen Gong
Delong Ran
Jinyuan Liu
Conglei Wang
Tianshuo Cong
Anyu Wang
Sisi Duan
Xiaoyun Wang
MLLM
129
118
0
09 Nov 2023
OtterHD: A High-Resolution Multi-modality Model
Bo-wen Li
Peiyuan Zhang
Jingkang Yang
Yuanhan Zhang
Fanyi Pu
Ziwei Liu
VLM
MLLM
35
65
0
07 Nov 2023
GTP-ViT: Efficient Vision Transformers via Graph-based Token Propagation
Xuwei Xu
Sen Wang
Yudong Chen
Yanping Zheng
Zhewei Wei
Jiajun Liu
ViT
27
8
0
06 Nov 2023
Align Your Prompts: Test-Time Prompting with Distribution Alignment for Zero-Shot Generalization
Jameel Hassan
Hanan Gani
Noor Hussein
Muhammad Uzair Khattak
Muzammal Naseer
Fahad Shahbaz Khan
Salman Khan
VLM
OOD
58
61
0
02 Nov 2023
Towards Evaluating Transfer-based Attacks Systematically, Practically, and Fairly
Qizhang Li
Yiwen Guo
Wangmeng Zuo
Hao Chen
ELM
AAML
35
2
0
02 Nov 2023
AiluRus: A Scalable ViT Framework for Dense Prediction
Jin Li
Yaoming Wang
Xiaopeng Zhang
Bowen Shi
Dongsheng Jiang
Chenglin Li
Wenrui Dai
Hongkai Xiong
Qi Tian
57
5
0
02 Nov 2023
CapsFusion: Rethinking Image-Text Data at Scale
Qiying Yu
Quan-Sen Sun
Xiaosong Zhang
Yufeng Cui
Fan Zhang
Yue Cao
Xinlong Wang
Jingjing Liu
VLM
23
54
0
31 Oct 2023
DDC-PIM: Efficient Algorithm/Architecture Co-design for Doubling Data Capacity of SRAM-based Processing-In-Memory
Cenlin Duan
Jianlei Yang
Xiaolin He
Yingjie Qi
Yikun Wang
...
Bonan Yan
Xueyan Wang
Xiaotao Jia
Weitao Pan
Weisheng Zhao
16
5
0
31 Oct 2023
Res-Tuning: A Flexible and Efficient Tuning Paradigm via Unbinding Tuner from Backbone
Zeyinzi Jiang
Chaojie Mao
Ziyuan Huang
Ao Ma
Yiliang Lv
Yujun Shen
Deli Zhao
Jingren Zhou
30
15
0
30 Oct 2023
Open-NeRF: Towards Open Vocabulary NeRF Decomposition
Hao Zhang
Fang Li
Narendra Ahuja
22
11
0
25 Oct 2023
SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding
Haoxiang Wang
Pavan Kumar Anasosalu Vasu
Fartash Faghri
Raviteja Vemulapalli
Mehrdad Farajtabar
Sachin Mehta
Mohammad Rastegari
Oncel Tuzel
Hadi Pouransari
VLM
32
67
0
23 Oct 2023
MSFormer: A Skeleton-multiview Fusion Method For Tooth Instance Segmentation
Yuan Li
Huan Liu
Y. Tao
Xiangyang He
Haifeng Li
Xiaohu Guo
Hai Lin
27
0
0
23 Oct 2023
Learning from Rich Semantics and Coarse Locations for Long-tailed Object Detection
Lingchen Meng
Xiyang Dai
Jianwei Yang
Dongdong Chen
Yinpeng Chen
Mengchen Liu
Yi-Ling Chen
Zuxuan Wu
Lu Yuan
Yu-Gang Jiang
16
6
0
18 Oct 2023
Beyond Segmentation: Road Network Generation with Multi-Modal LLMs
Sumedh Rasal
Sanjay K. Boddhu
32
5
0
15 Oct 2023
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning
Jun Chen
Deyao Zhu
Xiaoqian Shen
Xiang Li
Zechun Liu
Pengchuan Zhang
Raghuraman Krishnamoorthi
Vikas Chandra
Yunyang Xiong
Mohamed Elhoseiny
MLLM
160
441
0
14 Oct 2023
Uni3D: Exploring Unified 3D Representation at Scale
Junsheng Zhou
Jinsheng Wang
Baorui Ma
Yu-Shen Liu
Tiejun Huang
Xinlong Wang
40
88
0
10 Oct 2023
On the Evaluation and Refinement of Vision-Language Instruction Tuning Datasets
Ning Liao
Shaofeng Zhang
Renqiu Xia
Min Cao
Yu Qiao
Junchi Yan
MLLM
34
0
0
10 Oct 2023
Rephrase, Augment, Reason: Visual Grounding of Questions for Vision-Language Models
Archiki Prasad
Elias Stengel-Eskin
Mohit Bansal
ReLM
LRM
28
8
0
09 Oct 2023
No Token Left Behind: Efficient Vision Transformer via Dynamic Token Idling
Xuwei Xu
Changlin Li
Yudong Chen
Xiaojun Chang
Jiajun Liu
Sen Wang
ViT
21
5
0
09 Oct 2023
Plug n' Play: Channel Shuffle Module for Enhancing Tiny Vision Transformers
Xuwei Xu
Sen Wang
Yudong Chen
Jiajun Liu
ViT
21
1
0
09 Oct 2023
Improved Baselines with Visual Instruction Tuning
Haotian Liu
Chunyuan Li
Yuheng Li
Yong Jae Lee
VLM
MLLM
61
2,429
0
05 Oct 2023
Expedited Training of Visual Conditioned Language Generation via Redundancy Reduction
Yiren Jian
Tingkai Liu
Yunzhe Tao
Chunhui Zhang
Soroush Vosoughi
HX Yang
VLM
20
7
0
05 Oct 2023
Text-image Alignment for Diffusion-based Perception
Neehar Kondapaneni
Markus Marks
Manuel Knott
Rogério Guimarães
Pietro Perona
VLM
DiffM
24
32
0
29 Sep 2023
InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition
Pan Zhang
Xiaoyi Wang
Bin Wang
Yuhang Cao
Chao Xu
...
Conghui He
Xingcheng Zhang
Yu Qiao
Da Lin
Jiaqi Wang
MLLM
80
222
0
26 Sep 2023
MoCaE: Mixture of Calibrated Experts Significantly Improves Object Detection
Kemal Oksuz
Selim Kuzucu
Tom Joy
P. Dokania
MoE
22
5
0
26 Sep 2023
Viewpoint Integration and Registration with Vision Language Foundation Model for Image Change Understanding
Xiaonan Lu
Jianlong Yuan
Ruigang Niu
Yuan Hu
Fan Wang
21
1
0
15 Sep 2023
MMICL: Empowering Vision-language Model with Multi-Modal In-Context Learning
Haozhe Zhao
Zefan Cai
Shuzheng Si
Xiaojian Ma
Kaikai An
Liang Chen
Zixuan Liu
Sheng Wang
Wenjuan Han
Baobao Chang
MLLM
VLM
28
133
0
14 Sep 2023
Beyond Generation: Harnessing Text to Image Models for Object Detection and Segmentation
Yunhao Ge
Lyne Tchapmi
Brian Nlong Zhao
Neel Joshi
Laurent Itti
Vibhav Vineet
DiffM
35
14
0
12 Sep 2023
Unified Language-Vision Pretraining in LLM with Dynamic Discrete Visual Tokenization
Yang Jin
Kun Xu
Kun Xu
Liwei Chen
Chao Liao
...
Xiaoqiang Lei
Di Zhang
Wenwu Ou
Kun Gai
Yadong Mu
MLLM
VLM
16
41
0
09 Sep 2023
Do We Still Need Non-Maximum Suppression? Accurate Confidence Estimates and Implicit Duplication Modeling with IoU-Aware Calibration
Johannes Gilg
Torben Teepe
Fabian Herzog
Philipp Wolters
Gerhard Rigoll
13
1
0
06 Sep 2023
Image Aesthetics Assessment via Learnable Queries
Zhiwei Xiong
Yunfan Zhang
Zhiqi Shen
Peiran Ren
Han Yu
9
4
0
06 Sep 2023
NICE: CVPR 2023 Challenge on Zero-shot Image Captioning
Taehoon Kim
Pyunghwan Ahn
Sangyun Kim
Sihaeng Lee
Mark A Marsden
...
Yujin Wang
Yimu Wang
Tiancheng Gu
Xingchang Lv
Mingmao Sun
VLM
17
4
0
05 Sep 2023
DAT++: Spatially Dynamic Vision Transformer with Deformable Attention
Zhuofan Xia
Xuran Pan
Shiji Song
Li Erran Li
Gao Huang
ViT
27
24
0
04 Sep 2023
RevColV2: Exploring Disentangled Representations in Masked Image Modeling
Qi Han
Yuxuan Cai
Xiangyu Zhang
35
7
0
02 Sep 2023
Contrastive Feature Masking Open-Vocabulary Vision Transformer
Dahun Kim
A. Angelova
Weicheng Kuo
ObjD
VLM
23
27
0
02 Sep 2023
Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models
Yupan Huang
Zaiqiao Meng
Fangyu Liu
Yixuan Su
Nigel Collier
Yutong Lu
MLLM
41
22
0
31 Aug 2023
A General-Purpose Self-Supervised Model for Computational Pathology
Richard J. Chen
Tong Ding
Ming Y. Lu
Drew F. K. Williamson
Guillaume Jaume
...
Judy J. Wang
Walt Williams
L. Le
Georg Gerber
Faisal Mahmood
MedIm
28
42
0
29 Aug 2023
VIGC: Visual Instruction Generation and Correction
Bin Wang
Fan Wu
Xiao Han
Jiahui Peng
Huaping Zhong
...
Xiao-wen Dong
Weijia Li
Wei Li
Jiaqi Wang
Conghui He
MLLM
38
63
0
24 Aug 2023
Spatial Transform Decoupling for Oriented Object Detection
Hongtian Yu
Yunjie Tian
QiXiang Ye
Yunfan Liu
40
26
0
21 Aug 2023
ViT-Lens: Initiating Omni-Modal Exploration through 3D Insights
Weixian Lei
Yixiao Ge
Jianfeng Zhang
Dylan Sun
Kun Yi
Ying Shan
Mike Zheng Shou
33
1
0
20 Aug 2023
A Unified Interactive Model Evaluation for Classification, Object Detection, and Instance Segmentation in Computer Vision
Changjian Chen
Yukai Guo
Fengyuan Tian
Siyi Liu
Weikai Yang
Zhao-Ming Wang
Jing Wu
Hang Su
Hanspeter Pfister
Shixia Liu
20
15
0
09 Aug 2023
High-Level Parallelism and Nested Features for Dynamic Inference Cost and Top-Down Attention
A. Kelm
Niels Hannemann
Bruno Heberle
Lucas Schmidt
Tim Rolff
Christian Wilms
Ehsan Yaghoubi
Simone Frintrop
21
0
0
09 Aug 2023
Fine-tuning Multimodal LLMs to Follow Zero-shot Demonstrative Instructions
Juncheng Li
Kaihang Pan
Zhiqi Ge
Minghe Gao
Wei Ji
Wenqiao Zhang
Tat-Seng Chua
Siliang Tang
Hanwang Zhang
Yueting Zhuang
MLLM
32
68
0
08 Aug 2023
Tiny LVLM-eHub: Early Multimodal Experiments with Bard
Wenqi Shao
Yutao Hu
Peng Gao
Meng Lei
Kaipeng Zhang
...
Peng-Tao Xu
Siyuan Huang
Hongsheng Li
Yuning Qiao
Ping Luo
VLM
MLLM
32
2
0
07 Aug 2023
MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities
Weihao Yu
Zhengyuan Yang
Linjie Li
Jianfeng Wang
Kevin Qinghong Lin
Zicheng Liu
Xinchao Wang
Lijuan Wang
MLLM
45
607
0
04 Aug 2023
A Parameter-efficient Multi-subject Model for Predicting fMRI Activity
Connor Lane
Gregory Kiar
22
2
0
04 Aug 2023
The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World
Weiyun Wang
Min Shi
Qingyun Li
Wen Wang
Zhenhang Huang
...
Zhiguo Cao
Yushi Chen
Tong Lu
Jifeng Dai
Yu Qiao
LRM
MLLM
48
84
0
03 Aug 2023
DETR Doesn't Need Multi-Scale or Locality Design
Yutong Lin
Yuhui Yuan
Zheng-Wei Zhang
Chen Li
Nanning Zheng
Han Hu
37
5
0
03 Aug 2023
RegionBLIP: A Unified Multi-modal Pre-training Framework for Holistic and Regional Comprehension
Qiang-feng Zhou
Chaohui Yu
Shaofeng Zhang
Sitong Wu
Zhibin Wang
Fan Wang
34
27
0
03 Aug 2023
Guided Distillation for Semi-Supervised Instance Segmentation
Tariq Berrada
Camille Couprie
Alahari Karteek
Jakob Verbeek
29
10
0
03 Aug 2023
Previous
1
2
3
...
10
11
7
8
9
Next