Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2303.15105
Cited By
Vision Transformer with Quadrangle Attention
27 March 2023
Qiming Zhang
Jing Zhang
Yufei Xu
Dacheng Tao
ViT
Re-assign community
ArXiv (abs)
PDF
HTML
Github (214★)
Papers citing
"Vision Transformer with Quadrangle Attention"
31 / 31 papers shown
Title
LSNet: See Large, Focus Small
Ao Wang
Hui Chen
Zijia Lin
Jiawei Han
Guiguang Ding
98
0
0
29 Mar 2025
PVChat: Personalized Video Chat with One-Shot Learning
Yufei Shi
Weilong Yan
Gang Xu
Yumeng Li
Yongqian Li
Zechao Li
Fei Richard Yu
Ming Li
Si Yong Yeo
84
1
0
21 Mar 2025
DynamicVis: An Efficient and General Visual Foundation Model for Remote Sensing Image Understanding
Keyan Chen
Chenyang Liu
Bowen Chen
Wenyuan Li
Zhengxia Zou
Zhenwei Shi
78
3
0
20 Mar 2025
SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion
A. Nassar
Andres Marafioti
Matteo Omenetti
Maksym Lysak
Nikolaos Livathinos
...
Yusik Kim
A. Said Gurbuz
Michele Dolfi
Miquel Farré
Peter W. J. Staar
102
6
0
14 Mar 2025
Video-Text Dataset Construction from Multi-AI Feedback: Promoting Weak-to-Strong Preference Learning for Video Large Language Models
Hao Yi
Qingyang Li
Yihan Hu
Fuzheng Zhang
Di Zhang
Yong Liu
VGen
122
0
0
25 Nov 2024
Realizing Video Summarization from the Path of Language-based Semantic Understanding
Kuan-Chen Mu
Zhi-Yi Chin
Wei-Chen Chiu
49
0
0
06 Oct 2024
From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities
Wanpeng Zhang
Zilong Xie
Yicheng Feng
Yijiang Li
Xingrun Xing
Sipeng Zheng
Zongqing Lu
MLLM
122
1
0
03 Oct 2024
ReVLA: Reverting Visual Domain Limitation of Robotic Foundation Models
Sombit Dey
Jan-Nico Zaech
Nikolay Nikolov
Luc Van Gool
Danda Pani Paudel
MoMe
VLM
151
5
0
23 Sep 2024
Locality-aware Cross-modal Correspondence Learning for Dense Audio-Visual Events Localization
Ling Xing
Hongyu Qu
Rui Yan
Xiangbo Shu
Jinhui Tang
161
2
0
12 Sep 2024
UNIT: Unifying Image and Text Recognition in One Vision Encoder
Yi Zhu
Yanpeng Zhou
Chunwei Wang
Yang Cao
Jianhua Han
Lu Hou
Hang Xu
ViT
VLM
114
4
0
06 Sep 2024
LMLT: Low-to-high Multi-Level Vision Transformer for Image Super-Resolution
Jeongsoo Kim
Jongho Nang
Junsuk Choe
ViT
93
4
0
05 Sep 2024
MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions
Xiaowei Chi
Yatian Wang
Aosong Cheng
Pengjun Fang
Zeyue Tian
...
Wenhan Luo
Qifeng Chen
Shanghang Zhang
Qi-fei Liu
Yi-Ting Guo
129
7
0
30 Jul 2024
HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal Reasoning
Zhecan Wang
Garrett Bingham
Adams Wei Yu
Quoc V. Le
Thang Luong
Golnaz Ghiasi
MLLM
LRM
135
13
0
22 Jul 2024
PoseBench: Benchmarking the Robustness of Pose Estimation Models under Corruptions
Sihan Ma
Jing Zhang
Qiong Cao
Dacheng Tao
75
2
0
20 Jun 2024
Harnessing Massive Satellite Imagery with Efficient Masked Image Modeling
Fengxiang Wang
H. Wang
Di Wang
Zonghao Guo
Zhenyu Zhong
Long Lan
Wenjing Yang
Jing Zhang
89
3
0
17 Jun 2024
HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model
Di Wang
Meiqi Hu
Yao Jin
Yuchun Miao
Jiaqi Yang
...
Lefei Zhang
Chen Wu
Di Lin
Dacheng Tao
Liangpei Zhang
164
27
0
17 Jun 2024
MemeGuard: An LLM and VLM-based Framework for Advancing Content Moderation via Meme Intervention
Prince Jha
Raghav Jain
Konika Mandal
Aman Chadha
Sriparna Saha
P. Bhattacharyya
58
8
0
08 Jun 2024
Jailbreak Vision Language Models via Bi-Modal Adversarial Prompt
Zonghao Ying
Aishan Liu
Tianyuan Zhang
Zhengmin Yu
Siyuan Liang
Xianglong Liu
Dacheng Tao
AAML
116
40
0
06 Jun 2024
Follow-Your-Emoji: Fine-Controllable and Expressive Freestyle Portrait Animation
Yi Ma
Hongyu Liu
Haobo Wang
Heng Pan
Yingqing He
...
Ailing Zeng
Chengfei Cai
H. Shum
Wen Liu
Qifeng Chen
130
61
0
04 Jun 2024
Sharing Key Semantics in Transformer Makes Efficient Image Restoration
Bin Ren
Yawei Li
Christos Sakaridis
Rakesh Ranjan
Mengyuan Liu
Rita Cucchiara
Luc Van Gool
Ming-Hsuan Yang
N. Sebe
115
7
0
30 May 2024
LeMeViT: Efficient Vision Transformer with Learnable Meta Tokens for Remote Sensing Image Interpretation
Wentao Jiang
Jing Zhang
Di Wang
Qiming Zhang
Zengmao Wang
Bo Du
77
5
0
16 May 2024
Enhancing Efficiency in Vision Transformer Networks: Design Techniques and Insights
Moein Heidari
Reza Azad
Sina Ghorbani Kolahi
René Arimond
Leon Niggemeier
...
Afshin Bozorgpour
Ehsan Khodapanah Aghdam
Amirhossein Kazerouni
Ilker Hacihaliloglu
Dorit Merhof
97
7
0
28 Mar 2024
MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining
Di Wang
Jing Zhang
Minqiang Xu
Lin Liu
Dongsheng Wang
...
Chengxi Han
Haonan Guo
Bo Du
Dacheng Tao
Lefei Zhang
83
52
0
20 Mar 2024
FViT: A Focal Vision Transformer with Gabor Filter
Yulong Shi
Mingwei Sun
Yongshuai Wang
Rui Wang
151
4
0
17 Feb 2024
EViT: An Eagle Vision Transformer with Bi-Fovea Self-Attention
Yulong Shi
Mingwei Sun
Yongshuai Wang
Hui Sun
Zengqiang Chen
101
4
0
10 Oct 2023
SparseSwin: Swin Transformer with Sparse Transformer Block
Krisna Pinasthika
Blessius Sheldo Putra Laksono
Riyandi Banovbi Putera Irsal
Syifa’ Hukma Shabiyya
N. Yudistira
ViT
82
19
0
11 Sep 2023
DAT++: Spatially Dynamic Vision Transformer with Deformable Attention
Zhuofan Xia
Xuran Pan
Shiji Song
Li Erran Li
Gao Huang
ViT
91
27
0
04 Sep 2023
ROSGPT_Vision: Commanding Robots Using Only Language Models' Prompts
Bilel Benjdira
Anis Koubaa
Anas M. Ali
LM&Ro
58
4
0
22 Aug 2023
ESSAformer: Efficient Transformer for Hyperspectral Image Super-resolution
Mingjin Zhang
Chi Zhang
Qiming Zhang
Jie-Ru Guo
Xinbo Gao
Jing Zhang
70
30
0
26 Jul 2023
Deep Image Matting: A Comprehensive Survey
Jizhizi Li
Jing Zhang
Dacheng Tao
VLM
111
14
0
10 Apr 2023
Distract Your Attention: Multi-head Cross Attention Network for Facial Expression Recognition
Zhengyao Wen
Wen-Long Lin
Tao Wang
Ge Xu
CVBM
188
219
0
15 Sep 2021
1