ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2303.15105
  4. Cited By
Vision Transformer with Quadrangle Attention

Vision Transformer with Quadrangle Attention

27 March 2023
Qiming Zhang
Jing Zhang
Yufei Xu
Dacheng Tao
    ViT
ArXivPDFHTML

Papers citing "Vision Transformer with Quadrangle Attention"

36 / 36 papers shown
Title
LSNet: See Large, Focus Small
LSNet: See Large, Focus Small
Ao Wang
Hui Chen
Zijia Lin
J. Han
Guiguang Ding
42
0
0
29 Mar 2025
PVChat: Personalized Video Chat with One-Shot Learning
PVChat: Personalized Video Chat with One-Shot Learning
Yufei Shi
Weilong Yan
Gang Xu
Yumeng Li
Yongqian Li
ZeLin Li
Fei Richard Yu
Ming Li
Si Yong Yeo
43
0
0
21 Mar 2025
DynamicVis: An Efficient and General Visual Foundation Model for Remote Sensing Image Understanding
DynamicVis: An Efficient and General Visual Foundation Model for Remote Sensing Image Understanding
Keyan Chen
Chenyang Liu
Bowen Chen
Wenyuan Li
Zhengxia Zou
Zhenwei Shi
50
2
0
20 Mar 2025
SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion
A. Nassar
Andres Marafioti
Matteo Omenetti
Maksym Lysak
Nikolaos Livathinos
...
Yusik Kim
A. Said Gurbuz
Michele Dolfi
Miquel Farré
Peter W. J. Staar
58
3
0
14 Mar 2025
Video-Text Dataset Construction from Multi-AI Feedback: Promoting
  Weak-to-Strong Preference Learning for Video Large Language Models
Video-Text Dataset Construction from Multi-AI Feedback: Promoting Weak-to-Strong Preference Learning for Video Large Language Models
Hao Yi
Qingyang Li
Yihan Hu
Fuzheng Zhang
Di Zhang
Yong Liu
VGen
71
0
0
25 Nov 2024
Realizing Video Summarization from the Path of Language-based Semantic
  Understanding
Realizing Video Summarization from the Path of Language-based Semantic Understanding
Kuan-Chen Mu
Zhi-Yi Chin
Wei-Chen Chiu
25
0
0
06 Oct 2024
From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities
From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities
Wanpeng Zhang
Zilong Xie
Yicheng Feng
Yijiang Li
Xingrun Xing
Sipeng Zheng
Zongqing Lu
MLLM
30
0
0
03 Oct 2024
ReVLA: Reverting Visual Domain Limitation of Robotic Foundation Models
ReVLA: Reverting Visual Domain Limitation of Robotic Foundation Models
Sombit Dey
Jan-Nico Zaech
Nikolay Nikolov
Luc Van Gool
Danda Pani Paudel
MoMe
VLM
56
4
0
23 Sep 2024
Locality-aware Cross-modal Correspondence Learning for Dense Audio-Visual Events Localization
Locality-aware Cross-modal Correspondence Learning for Dense Audio-Visual Events Localization
Ling Xing
Hongyu Qu
Rui Yan
Xiangbo Shu
Jinhui Tang
45
1
0
12 Sep 2024
UNIT: Unifying Image and Text Recognition in One Vision Encoder
UNIT: Unifying Image and Text Recognition in One Vision Encoder
Yi Zhu
Yanpeng Zhou
Chunwei Wang
Yang Cao
Jianhua Han
Lu Hou
Hang Xu
ViT
VLM
34
4
0
06 Sep 2024
LMLT: Low-to-high Multi-Level Vision Transformer for Image
  Super-Resolution
LMLT: Low-to-high Multi-Level Vision Transformer for Image Super-Resolution
Jeongsoo Kim
Jongho Nang
Junsuk Choe
ViT
24
4
0
05 Sep 2024
MMTrail: A Multimodal Trailer Video Dataset with Language and Music
  Descriptions
MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions
Xiaowei Chi
Yatian Wang
Aosong Cheng
Pengjun Fang
Zeyue Tian
...
Wenhan Luo
Qifeng Chen
Shanghang Zhang
Qi-fei Liu
Yi-Ting Guo
72
7
0
30 Jul 2024
HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal
  Reasoning
HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal Reasoning
Zhecan Wang
Garrett Bingham
Adams Wei Yu
Quoc V. Le
Thang Luong
Golnaz Ghiasi
MLLM
LRM
37
9
0
22 Jul 2024
PoseBench: Benchmarking the Robustness of Pose Estimation Models under
  Corruptions
PoseBench: Benchmarking the Robustness of Pose Estimation Models under Corruptions
Sihan Ma
Jing Zhang
Qiong Cao
Dacheng Tao
32
2
0
20 Jun 2024
HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model
HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model
Di Wang
Meiqi Hu
Yao Jin
Yuchun Miao
Jiaqi Yang
...
Lefei Zhang
Chen Wu
Bo Du
Dacheng Tao
Liangpei Zhang
66
25
0
17 Jun 2024
MemeGuard: An LLM and VLM-based Framework for Advancing Content
  Moderation via Meme Intervention
MemeGuard: An LLM and VLM-based Framework for Advancing Content Moderation via Meme Intervention
Prince Jha
Raghav Jain
Konika Mandal
Aman Chadha
Sriparna Saha
P. Bhattacharyya
21
6
0
08 Jun 2024
Jailbreak Vision Language Models via Bi-Modal Adversarial Prompt
Jailbreak Vision Language Models via Bi-Modal Adversarial Prompt
Zonghao Ying
Aishan Liu
Tianyuan Zhang
Zhengmin Yu
Siyuan Liang
Xianglong Liu
Dacheng Tao
AAML
35
26
0
06 Jun 2024
Follow-Your-Emoji: Fine-Controllable and Expressive Freestyle Portrait
  Animation
Follow-Your-Emoji: Fine-Controllable and Expressive Freestyle Portrait Animation
Y. Ma
Hongyu Liu
Haozhao Wang
Heng Pan
Yingqing He
...
Ailing Zeng
Chengfei Cai
H. Shum
Wei Liu
Qifeng Chen
36
52
0
04 Jun 2024
Sharing Key Semantics in Transformer Makes Efficient Image Restoration
Sharing Key Semantics in Transformer Makes Efficient Image Restoration
Bin Ren
Yawei Li
Jingyun Liang
Rakesh Ranjan
Mengyuan Liu
Rita Cucchiara
Luc Van Gool
Ming-Hsuan Yang
N. Sebe
43
5
0
30 May 2024
LeMeViT: Efficient Vision Transformer with Learnable Meta Tokens for
  Remote Sensing Image Interpretation
LeMeViT: Efficient Vision Transformer with Learnable Meta Tokens for Remote Sensing Image Interpretation
Wentao Jiang
Jing Zhang
Di Wang
Qiming Zhang
Zengmao Wang
Bo Du
37
5
0
16 May 2024
Enhancing Efficiency in Vision Transformer Networks: Design Techniques
  and Insights
Enhancing Efficiency in Vision Transformer Networks: Design Techniques and Insights
Moein Heidari
Reza Azad
Sina Ghorbani Kolahi
René Arimond
Leon Niggemeier
...
Afshin Bozorgpour
Ehsan Khodapanah Aghdam
A. Kazerouni
I. Hacihaliloglu
Dorit Merhof
48
7
0
28 Mar 2024
MTP: Advancing Remote Sensing Foundation Model via Multi-Task
  Pretraining
MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining
Di Wang
Jing Zhang
Minqiang Xu
Lin Liu
Dongsheng Wang
...
Chengxi Han
Haonan Guo
Bo Du
Dacheng Tao
L. Zhang
37
44
0
20 Mar 2024
FViT: A Focal Vision Transformer with Gabor Filter
FViT: A Focal Vision Transformer with Gabor Filter
Yulong Shi
Mingwei Sun
Yongshuai Wang
Rui Wang
55
4
0
17 Feb 2024
EViT: An Eagle Vision Transformer with Bi-Fovea Self-Attention
EViT: An Eagle Vision Transformer with Bi-Fovea Self-Attention
Yulong Shi
Mingwei Sun
Yongshuai Wang
Hui Sun
Zengqiang Chen
34
4
0
10 Oct 2023
SparseSwin: Swin Transformer with Sparse Transformer Block
SparseSwin: Swin Transformer with Sparse Transformer Block
Krisna Pinasthika
Blessius Sheldo Putra Laksono
Riyandi Banovbi Putera Irsal
Syifa’ Hukma Shabiyya
N. Yudistira
ViT
26
15
0
11 Sep 2023
DAT++: Spatially Dynamic Vision Transformer with Deformable Attention
DAT++: Spatially Dynamic Vision Transformer with Deformable Attention
Zhuofan Xia
Xuran Pan
Shiji Song
Li Erran Li
Gao Huang
ViT
24
24
0
04 Sep 2023
ROSGPT_Vision: Commanding Robots Using Only Language Models' Prompts
ROSGPT_Vision: Commanding Robots Using Only Language Models' Prompts
Bilel Benjdira
Anis Koubaa
Anas M. Ali
LM&Ro
22
3
0
22 Aug 2023
ESSAformer: Efficient Transformer for Hyperspectral Image
  Super-resolution
ESSAformer: Efficient Transformer for Hyperspectral Image Super-resolution
Mingjin Zhang
Chi Zhang
Qiming Zhang
Jie-Ru Guo
Xinbo Gao
Jing Zhang
13
29
0
26 Jul 2023
Deep Image Matting: A Comprehensive Survey
Deep Image Matting: A Comprehensive Survey
Jizhizi Li
Jing Zhang
Dacheng Tao
VLM
33
14
0
10 Apr 2023
Masked Autoencoders Are Scalable Vision Learners
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
305
7,443
0
11 Nov 2021
Distract Your Attention: Multi-head Cross Attention Network for Facial
  Expression Recognition
Distract Your Attention: Multi-head Cross Attention Network for Facial Expression Recognition
Zhengyao Wen
Wen-Long Lin
Tao Wang
Ge Xu
CVBM
104
208
0
15 Sep 2021
CMT: Convolutional Neural Networks Meet Vision Transformers
CMT: Convolutional Neural Networks Meet Vision Transformers
Jianyuan Guo
Kai Han
Han Wu
Yehui Tang
Chunjing Xu
Yunhe Wang
Chang Xu
ViT
351
633
0
13 Jul 2021
Transformer in Transformer
Transformer in Transformer
Kai Han
An Xiao
Enhua Wu
Jianyuan Guo
Chunjing Xu
Yunhe Wang
ViT
284
1,524
0
27 Feb 2021
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction
  without Convolutions
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
ViT
277
3,623
0
24 Feb 2021
Aggregated Residual Transformations for Deep Neural Networks
Aggregated Residual Transformations for Deep Neural Networks
Saining Xie
Ross B. Girshick
Piotr Dollár
Z. Tu
Kaiming He
297
10,220
0
16 Nov 2016
Semantic Understanding of Scenes through the ADE20K Dataset
Semantic Understanding of Scenes through the ADE20K Dataset
Bolei Zhou
Hang Zhao
Xavier Puig
Tete Xiao
Sanja Fidler
Adela Barriuso
Antonio Torralba
SSeg
253
1,828
0
18 Aug 2016
1