Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2204.03645
Cited By
DaViT: Dual Attention Vision Transformers
7 April 2022
Mingyu Ding
Bin Xiao
Noel Codella
Ping Luo
Jingdong Wang
Lu Yuan
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"DaViT: Dual Attention Vision Transformers"
50 / 128 papers shown
Title
A Semantic-Enhanced Heterogeneous Graph Learning Method for Flexible Objects Recognition
Kunshan Yang
Wenwei Luo
Yuguo Hu
Jiafu Yan
Mengmeng Jing
Lin Zuo
36
0
0
28 Mar 2025
Beyond Accuracy: What Matters in Designing Well-Behaved Models?
Robin Hesse
Doğukan Bağcı
Bernt Schiele
Simone Schaub-Meyer
Stefan Roth
VLM
62
0
0
21 Mar 2025
MEET: A Million-Scale Dataset for Fine-Grained Geospatial Scene Classification with Zoom-Free Remote Sensing Imagery
Yansheng Li
Yuning Wu
Gong Cheng
Chao Tao
Bo Dang
...
C. Zhang
Yao Liu
X. Tang
Jiayi Ma
Yongjun Zhang
50
2
0
14 Mar 2025
Florenz: Scaling Laws for Systematic Generalization in Vision-Language Models
Julian Spravil
Sebastian Houben
Sven Behnke
VLM
75
0
0
12 Mar 2025
CATANet: Efficient Content-Aware Token Aggregation for Lightweight Image Super-Resolution
Xin Liu
Jie Liu
J. Tang
Gangshan Wu
SupR
ViT
54
0
0
10 Mar 2025
A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications
Siyuan Mu
Sen Lin
MoE
135
1
0
10 Mar 2025
Enhancing Abnormality Grounding for Vision Language Models with Knowledge Descriptions
Jun Yu Li
Che Liu
Wenjia Bai
Rossella Arcucci
Cosmin I. Bercea
Julia A. Schnabel
43
0
0
05 Mar 2025
Revisit the Stability of Vanilla Federated Learning Under Diverse Conditions
Youngjoon Lee
J. Gong
Sun Choi
Joonhyuk Kang
FedML
Presented at
ResearchTrend Connect | FedML
on
23 Apr 2025
124
1
0
27 Feb 2025
Towards Accurate Unified Anomaly Segmentation
Wenxin Ma
Qingsong Yao
Xiang Zhang
Zhelong Huang
Zihang Jiang
S. Kevin Zhou
78
1
0
21 Jan 2025
VMamba: Visual State Space Model
Yue Liu
Yunjie Tian
Yuzhong Zhao
Hongtian Yu
Lingxi Xie
Yaowei Wang
Qixiang Ye
Jianbin Jiao
Yunfan Liu
Mamba
152
612
0
31 Dec 2024
Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion
Jiuhai Chen
Jianwei Yang
Haiping Wu
Dianqi Li
Jianfeng Gao
Tianyi Zhou
Bin Xiao
VLM
60
4
0
05 Dec 2024
Token Cropr: Faster ViTs for Quite a Few Tasks
Benjamin Bergner
C. Lippert
Aravindh Mahendran
ViT
VLM
71
0
0
01 Dec 2024
Breaking the Low-Rank Dilemma of Linear Attention
Qihang Fan
Huaibo Huang
Ran He
45
1
0
12 Nov 2024
GCI-ViTAL: Gradual Confidence Improvement with Vision Transformers for Active Learning on Label Noise
Moseli Motsóehli
Kyungim Baek
34
1
0
08 Nov 2024
Historical Test-time Prompt Tuning for Vision Foundation Models
Jingyi Zhang
Jiaxing Huang
Xiaoqin Zhang
Ling Shao
Shijian Lu
VLM
37
4
0
27 Oct 2024
Transforming Precision: A Comparative Analysis of Vision Transformers, CNNs, and Traditional ML for Knee Osteoarthritis Severity Diagnosis
Tasnim Sakib Apon
Md. Fahim-Ul-Islam
Nafiz Imtiaz Rafin
Joya Akter
Md. Golam Rabiul Alam
16
1
0
26 Oct 2024
Multi-Class Abnormality Classification Task in Video Capsule Endoscopy
Dev Rishi Verma
Vibhor Saxena
Dhruv Sharma
Arpan Gupta
24
1
0
25 Oct 2024
Efficient Deep Learning Board: Training Feedback Is Not All You Need
Lina Gong
Qi Gao
Peng Li
Mingqiang Wei
Fei Wu
OOD
32
0
0
17 Oct 2024
MedImageInsight: An Open-Source Embedding Model for General Domain Medical Imaging
Noel C. F. Codella
Ying Jin
Shrey Jain
Yu Gu
Ho Hin Lee
...
Jenq-Neng Hwang
Thomas Lin
Ivan Tarapov
M. Lungren
Mu-Hsin Wei
LM&MA
VLM
MedIm
40
8
0
09 Oct 2024
Window-based Channel Attention for Wavelet-enhanced Learned Image Compression
Heng Xu
Bowen Hai
Yushun Tang
Zhihai He
18
0
0
21 Sep 2024
LLM-wrapper: Black-Box Semantic-Aware Adaptation of Vision-Language Models for Referring Expression Comprehension
Amaia Cardiel
Éloi Zablocki
Oriane Siméoni
Elias Ramzi
Matthieu Cord
VLM
28
0
0
18 Sep 2024
AceParse: A Comprehensive Dataset with Diverse Structured Texts for Academic Literature Parsing
Huawei Ji
Cheng Deng
Bo Xue
Zhouyang Jin
Jiaxin Ding
Xiaoying Gan
Luoyi Fu
Xinbing Wang
Chenghu Zhou
26
0
0
16 Sep 2024
Pluralistic Salient Object Detection
Xuelu Feng
Yunsheng Li
Dongdong Chen
Chunming Qiao
Junsong Yuan
Lu Yuan
G. Hua
37
1
0
04 Sep 2024
PointMT: Efficient Point Cloud Analysis with Hybrid MLP-Transformer Architecture
Qiang Zheng
Chao Zhang
Jian Sun
30
1
0
10 Aug 2024
CAS-ViT: Convolutional Additive Self-attention Vision Transformers for Efficient Mobile Applications
Tianfang Zhang
Lei Li
Yang Zhou
Wentao Liu
Chen Qian
Xiangyang Ji
ViT
30
12
0
07 Aug 2024
Embedding-Free Transformer with Inference Spatial Reduction for Efficient Semantic Segmentation
Hyunwoo Yu
Yubin Cho
Beoungwoo Kang
Seunghun Moon
Kyeongbo Kong
Suk-Ju Kang
30
3
0
24 Jul 2024
HiT-SR: Hierarchical Transformer for Efficient Image Super-Resolution
Xiang Zhang
Yulun Zhang
Fisher Yu
42
15
0
08 Jul 2024
Flexible ViG: Learning the Self-Saliency for Flexible Object Recognition
Lin Zuo
Kunshan Yang
Xianlong Tian
Kunbin He
Yongqi Ding
Mengmeng Jing
27
1
0
06 Jun 2024
Building Vision Models upon Heat Conduction
Zhaozhi Wang
Yue Liu
Yunfan Liu
Hongtian Yu
Yaowei Wang
QiXiang Ye
ViT
VLM
55
0
0
26 May 2024
ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models
Chunjiang Ge
Sijie Cheng
Ziming Wang
Jiale Yuan
Yuan Gao
Jun Song
Shiji Song
Gao Huang
Bo Zheng
MLLM
VLM
28
17
0
24 May 2024
Semantic Equitable Clustering: A Simple, Fast and Effective Strategy for Vision Transformer
Qihang Fan
Huaibo Huang
Mingrui Chen
Ran He
51
0
0
22 May 2024
Vision Transformer with Sparse Scan Prior
Qihang Fan
Huaibo Huang
Mingrui Chen
Ran He
ViT
48
5
0
22 May 2024
MCM: Multi-condition Motion Synthesis Framework
Zeyu Ling
Bo Han
Yongkang Wang
Han Lin
Mohan S. Kankanhalli
Weidong Geng
35
1
0
19 Apr 2024
CU-Mamba: Selective State Space Models with Channel Learning for Image Restoration
Rui Deng
Tianpei Gu
Mamba
42
16
0
17 Apr 2024
MambaMixer: Efficient Selective State Space Models with Dual Token and Channel Selection
Ali Behrouz
Michele Santacatterina
Ramin Zabih
44
31
0
29 Mar 2024
ViTAR: Vision Transformer with Any Resolution
Qihang Fan
Quanzeng You
Xiaotian Han
Yongfei Liu
Yunzhe Tao
Huaibo Huang
Ran He
Hongxia Yang
ViT
44
14
0
27 Mar 2024
Heracles: A Hybrid SSM-Transformer Model for High-Resolution Image and Time-Series Analysis
Badri N. Patro
Suhas Ranganath
Vinay P. Namboodiri
Vijay Srinivas Agneeswaran
43
2
0
26 Mar 2024
Tiny Models are the Computational Saver for Large Models
Qingyuan Wang
B. Cardiff
Antoine Frappé
Benoît Larras
Deepu John
41
2
0
26 Mar 2024
PlainMamba: Improving Non-Hierarchical Mamba in Visual Recognition
Chenhongyi Yang
Zehui Chen
Miguel Espinosa
Linus Ericsson
Zhenyu Wang
Jiaming Liu
Elliot J. Crowley
Mamba
36
86
0
26 Mar 2024
HIRI-ViT: Scaling Vision Transformer with High Resolution Inputs
Ting Yao
Yehao Li
Yingwei Pan
Tao Mei
ViT
28
15
0
18 Mar 2024
AllSpark: Reborn Labeled Features from Unlabeled in Transformer for Semi-Supervised Semantic Segmentation
Haonan Wang
Qixiang Zhang
Yi Li
Xiaomeng Li
43
16
0
04 Mar 2024
YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information
Chien-Yao Wang
I-Hau Yeh
Hongpeng Liao
54
1,151
0
21 Feb 2024
Mamba-ND: Selective State Space Modeling for Multi-Dimensional Data
Shufan Li
Harkanwar Singh
Aditya Grover
Mamba
92
56
0
08 Feb 2024
Delving into Multi-modal Multi-task Foundation Models for Road Scene Understanding: From Learning Paradigm Perspectives
Sheng Luo
Wei Chen
Wanxin Tian
Rui Liu
Luanxuan Hou
...
Ling Shao
Yi Yang
Bojun Gao
Qun Li
Guobin Wu
51
13
0
05 Feb 2024
Learning to Prompt Segment Anything Models
Jiaxing Huang
Kai Jiang
Jingyi Zhang
Han Qiu
Lewei Lu
Shijian Lu
Eric P. Xing
VLM
LRM
45
7
0
09 Jan 2024
TPC-ViT: Token Propagation Controller for Efficient Vision Transformer
Wentao Zhu
23
2
0
03 Jan 2024
Factorization Vision Transformer: Modeling Long Range Dependency with Local Window Cost
Haolin Qin
Daquan Zhou
Tingfa Xu
Ziyang Bian
Jianan Li
29
9
0
14 Dec 2023
TransMed: Large Language Models Enhance Vision Transformer for Biomedical Image Classification
Kaipeng Zheng
Weiran Huang
Lichao Sun
LM&MA
MedIm
VLM
29
0
0
12 Dec 2023
Segment and Caption Anything
Xiaoke Huang
Jianfeng Wang
Yansong Tang
Zheng Zhang
Han Hu
Jiwen Lu
Lijuan Wang
Zicheng Liu
MLLM
VLM
26
18
0
01 Dec 2023
SCHEME: Scalable Channel Mixer for Vision Transformers
Deepak Sridhar
Yunsheng Li
Nuno Vasconcelos
44
0
0
01 Dec 2023
1
2
3
Next