ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1802.05751
  4. Cited By
Image Transformer
v1v2v3 (latest)

Image Transformer

15 February 2018
Niki Parmar
Ashish Vaswani
Jakob Uszkoreit
Lukasz Kaiser
Noam M. Shazeer
Alexander Ku
Dustin Tran
    ViT
ArXiv (abs)PDFHTML

Papers citing "Image Transformer"

50 / 837 papers shown
Title
Efficient Learnable Collaborative Attention for Single Image
  Super-Resolution
Efficient Learnable Collaborative Attention for Single Image Super-Resolution
Yi-Gang Zhao
Jian-Nan Su
Guang-yong Chen
Senior Member Ieee Min Gan
62
0
0
07 Apr 2024
Diffusion-RWKV: Scaling RWKV-Like Architectures for Diffusion Models
Diffusion-RWKV: Scaling RWKV-Like Architectures for Diffusion Models
Zhengcong Fei
Mingyuan Fan
Changqian Yu
Debang Li
Junshi Huang
108
26
0
06 Apr 2024
ParCo: Part-Coordinating Text-to-Motion Synthesis
ParCo: Part-Coordinating Text-to-Motion Synthesis
Qiran Zou
Shangyuan Yuan
Shian Du
Yu Wang
Chang-Shu Liu
Yi Tian Xu
Jie Chen
Xiangyang Ji
75
20
0
27 Mar 2024
Attention Prompt Tuning: Parameter-efficient Adaptation of Pre-trained
  Models for Spatiotemporal Modeling
Attention Prompt Tuning: Parameter-efficient Adaptation of Pre-trained Models for Spatiotemporal Modeling
W. G. C. Bandara
Vishal M. Patel
VPVLMVLM
83
1
0
11 Mar 2024
Faster Neighborhood Attention: Reducing the O(n^2) Cost of Self
  Attention at the Threadblock Level
Faster Neighborhood Attention: Reducing the O(n^2) Cost of Self Attention at the Threadblock Level
Ali Hassani
Wen-mei W. Hwu
Humphrey Shi
66
9
0
07 Mar 2024
UB-FineNet: Urban Building Fine-grained Classification Network for
  Open-access Satellite Images
UB-FineNet: Urban Building Fine-grained Classification Network for Open-access Satellite Images
Zhiyi He
Wei Yao
Jie Shao
Puzuo Wang
95
7
0
04 Mar 2024
Large Convolutional Model Tuning via Filter Subspace
Large Convolutional Model Tuning via Filter Subspace
Wei Chen
Zichen Miao
Qiang Qiu
229
4
0
01 Mar 2024
Orchid: Flexible and Data-Dependent Convolution for Sequence Modeling
Orchid: Flexible and Data-Dependent Convolution for Sequence Modeling
Mahdi Karami
Ali Ghodsi
VLM
116
6
0
28 Feb 2024
Perceiving Longer Sequences With Bi-Directional Cross-Attention
  Transformers
Perceiving Longer Sequences With Bi-Directional Cross-Attention Transformers
Markus Hiller
Krista A. Ehinger
Tom Drummond
110
4
0
19 Feb 2024
WildFake: A Large-scale Challenging Dataset for AI-Generated Images
  Detection
WildFake: A Large-scale Challenging Dataset for AI-Generated Images Detection
Yan Hong
Jianfu Zhang
141
13
0
19 Feb 2024
Prospector Heads: Generalized Feature Attribution for Large Models &
  Data
Prospector Heads: Generalized Feature Attribution for Large Models & Data
Gautam Machiraju
Alexander Derry
Arjun D Desai
Neel Guha
Amir-Hossein Karimi
James Zou
Russ Altman
Christopher Ré
Parag Mallick
AI4TSMedIm
121
0
0
18 Feb 2024
Visual Concept-driven Image Generation with Text-to-Image Diffusion Model
Visual Concept-driven Image Generation with Text-to-Image Diffusion Model
Tanzila Rahman
Shweta Mahajan
Hsin-Ying Lee
Jian Ren
Sergey Tulyakov
Leonid Sigal
167
4
0
18 Feb 2024
Get More with LESS: Synthesizing Recurrence with KV Cache Compression
  for Efficient LLM Inference
Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference
Harry Dong
Xinyu Yang
Zhenyu Zhang
Zhangyang Wang
Yuejie Chi
Beidi Chen
80
54
0
14 Feb 2024
Spatially-Attentive Patch-Hierarchical Network with Adaptive Sampling
  for Motion Deblurring
Spatially-Attentive Patch-Hierarchical Network with Adaptive Sampling for Motion Deblurring
Maitreya Suin
Kuldeep Purohit
A. N. Rajagopalan
65
0
0
09 Feb 2024
Data-efficient Large Vision Models through Sequential Autoregression
Data-efficient Large Vision Models through Sequential Autoregression
Jianyuan Guo
Zhiwei Hao
Chengcheng Wang
Yehui Tang
Han Wu
Han Hu
Kai Han
Chang Xu
VLM
110
10
0
07 Feb 2024
A Survey on Transformer Compression
A Survey on Transformer Compression
Yehui Tang
Yunhe Wang
Jianyuan Guo
Zhijun Tu
Kai Han
Hailin Hu
Dacheng Tao
156
35
0
05 Feb 2024
Exploring the Synergies of Hybrid CNNs and ViTs Architectures for
  Computer Vision: A survey
Exploring the Synergies of Hybrid CNNs and ViTs Architectures for Computer Vision: A survey
Haruna Yunusa
Shiyin Qin
Abdulrahman Hamman Adama Chukkol
Abdulganiyu Abdu Yusuf
Isah Bello
A. Lawan
ViT
116
14
0
05 Feb 2024
OnDev-LCT: On-Device Lightweight Convolutional Transformers towards federated learning
OnDev-LCT: On-Device Lightweight Convolutional Transformers towards federated learning
Chu Myaet Thwal
Minh N. H. Nguyen
Ye Lin Tun
Seongjin Kim
My T. Thai
Choong Seon Hong
125
7
0
22 Jan 2024
SiT: Exploring Flow and Diffusion-based Generative Models with Scalable
  Interpolant Transformers
SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers
Nanye Ma
Mark Goldstein
M. S. Albergo
Nicholas M. Boffi
Eric Vanden-Eijnden
Saining Xie
DiffM
156
214
0
16 Jan 2024
Scalable Pre-training of Large Autoregressive Image Models
Scalable Pre-training of Large Autoregressive Image Models
Alaaeldin El-Nouby
Michal Klein
Shuangfei Zhai
Miguel Angel Bautista
Alexander Toshev
Vaishaal Shankar
J. Susskind
Armand Joulin
VLM
105
80
0
16 Jan 2024
Jump Cut Smoothing for Talking Heads
Jump Cut Smoothing for Talking Heads
Xiaojuan Wang
Taesung Park
Yang Zhou
Eli Shechtman
Richard Zhang
VGen
79
1
0
09 Jan 2024
CrisisViT: A Robust Vision Transformer for Crisis Image Classification
CrisisViT: A Robust Vision Transformer for Crisis Image Classification
Zijun Long
R. McCreadie
Muhammad Imran
151
10
0
05 Jan 2024
GTA: Guided Transfer of Spatial Attention from Object-Centric
  Representations
GTA: Guided Transfer of Spatial Attention from Object-Centric Representations
SeokHyun Seo
Jinwoo Hong
Jungwoo Chae
Kyungyul Kim
Sangheum Hwang
77
0
0
05 Jan 2024
Latte: Latent Diffusion Transformer for Video Generation
Latte: Latent Diffusion Transformer for Video Generation
Xin Ma
Yaohui Wang
Gengyun Jia
Xinyuan Chen
Ziqiang Liu
Yuan-Fang Li
Cunjian Chen
Yu Qiao
DiffMVGen
291
280
0
05 Jan 2024
BA-SAM: Scalable Bias-Mode Attention Mask for Segment Anything Model
BA-SAM: Scalable Bias-Mode Attention Mask for Segment Anything Model
Yiran Song
Qianyu Zhou
Hefei Ling
Deng-Ping Fan
Xuequan Lu
Lizhuang Ma
VLM
143
15
0
04 Jan 2024
PanGu-$π$: Enhancing Language Model Architectures via Nonlinearity
  Compensation
PanGu-πππ: Enhancing Language Model Architectures via Nonlinearity Compensation
Yunhe Wang
Hanting Chen
Yehui Tang
Tianyu Guo
Kai Han
...
Qinghua Xu
Qun Liu
Jun Yao
Chao Xu
Dacheng Tao
128
20
0
27 Dec 2023
Diffusion Models With Learned Adaptive Noise
Diffusion Models With Learned Adaptive Noise
Subham Sekhar Sahoo
Aaron Gokaslan
Christopher De Sa
Volodymyr Kuleshov
DiffM
120
16
0
20 Dec 2023
Integrating Human Vision Perception in Vision Transformers for
  Classifying Waste Items
Integrating Human Vision Perception in Vision Transformers for Classifying Waste Items
Akshat Shrivastava
Tapan K. Gandhi
75
1
0
19 Dec 2023
Linear Attention via Orthogonal Memory
Linear Attention via Orthogonal Memory
Jun Zhang
Shuyang Jiang
Jiangtao Feng
Lin Zheng
Lingpeng Kong
98
3
0
18 Dec 2023
GIVT: Generative Infinite-Vocabulary Transformers
GIVT: Generative Infinite-Vocabulary Transformers
Michael Tschannen
Cian Eastwood
Fabian Mentzer
105
41
0
04 Dec 2023
SCHEME: Scalable Channel Mixer for Vision Transformers
SCHEME: Scalable Channel Mixer for Vision Transformers
Deepak Sridhar
Yunsheng Li
Nuno Vasconcelos
170
0
0
01 Dec 2023
SinSR: Diffusion-Based Image Super-Resolution in a Single Step
SinSR: Diffusion-Based Image Super-Resolution in a Single Step
Yufei Wang
Wenhan Yang
Xinyuan Chen
Yaohui Wang
Lanqing Guo
Lap-Pui Chau
Ziwei Liu
Yu Qiao
Alex C. Kot
Bihan Wen
DiffM
171
121
0
23 Nov 2023
Long-MIL: Scaling Long Contextual Multiple Instance Learning for
  Histopathology Whole Slide Image Analysis
Long-MIL: Scaling Long Contextual Multiple Instance Learning for Histopathology Whole Slide Image Analysis
Honglin Li
Yunlong Zhang
Chenglu Zhu
Jiatong Cai
Sunyi Zheng
Lin Yang
VLM
94
4
0
21 Nov 2023
Imagine the Unseen World: A Benchmark for Systematic Generalization in
  Visual World Models
Imagine the Unseen World: A Benchmark for Systematic Generalization in Visual World Models
Yeongbin Kim
Gautam Singh
Junyeong Park
Çağlar Gülçehre
Sungjin Ahn
OCLVLM
111
5
0
15 Nov 2023
MuST: Multimodal Spatiotemporal Graph-Transformer for Hospital
  Readmission Prediction
MuST: Multimodal Spatiotemporal Graph-Transformer for Hospital Readmission Prediction
Yan Miao
Lequan Yu
26
2
0
11 Nov 2023
A Hierarchical Spatial Transformer for Massive Point Samples in
  Continuous Space
A Hierarchical Spatial Transformer for Massive Point Samples in Continuous Space
Wenchong He
Zhe Jiang
Tingsong Xiao
Zelin Xu
Shigang Chen
Ronald Fick
Miles Medina
Christine Angelini
94
17
0
08 Nov 2023
Multimodal Machine Learning in Image-Based and Clinical Biomedicine:
  Survey and Prospects
Multimodal Machine Learning in Image-Based and Clinical Biomedicine: Survey and Prospects
Elisa Warner
Joonsan Lee
William Hsu
Tanveer Syeda-Mahmood
Charles Kahn
Olivier Gevaert
Arvind Rao
LM&MA
101
11
0
04 Nov 2023
Revamping AI Models in Dermatology: Overcoming Critical Challenges for
  Enhanced Skin Lesion Diagnosis
Revamping AI Models in Dermatology: Overcoming Critical Challenges for Enhanced Skin Lesion Diagnosis
Deval Mehta
B. Betz‐Stablein
Toàn D. Nguyên
Yaniv Gal
Adrian Bowling
...
M. Sashindranath
Paul Bonnington
Victoria Mar
Peter Soyer
Zongyuan Ge
VLM
77
2
0
02 Nov 2023
Improving Robustness for Vision Transformer with a Simple Dynamic
  Scanning Augmentation
Improving Robustness for Vision Transformer with a Simple Dynamic Scanning Augmentation
Shashank Kotyan
Danilo Vasconcellos Vargas
ViT
75
4
0
01 Nov 2023
DiffEnc: Variational Diffusion with a Learned Encoder
DiffEnc: Variational Diffusion with a Learned Encoder
Beatrix M. G. Nielsen
Anders Christensen
Andrea Dittadi
Ole Winther
DiffM
125
13
0
30 Oct 2023
ViR: Towards Efficient Vision Retention Backbones
ViR: Towards Efficient Vision Retention Backbones
Ali Hatamizadeh
Michael Ranzinger
Shiyi Lan
Jose M. Alvarez
Sanja Fidler
Jan Kautz
GNN
38
2
0
30 Oct 2023
Pre-training with Random Orthogonal Projection Image Modeling
Pre-training with Random Orthogonal Projection Image Modeling
Maryam Haghighat
Peyman Moghadam
Shaheer Mohamed
Piotr Koniusz
VLM
87
9
0
28 Oct 2023
Grid Jigsaw Representation with CLIP: A New Perspective on Image Clustering
Grid Jigsaw Representation with CLIP: A New Perspective on Image Clustering
Zijie Song
Zhenzhen Hu
Richang Hong
SSL
113
0
0
27 Oct 2023
Generative Marginalization Models
Generative Marginalization Models
Sulin Liu
Peter J. Ramadge
Ryan P. Adams
76
1
0
19 Oct 2023
Improved Operator Learning by Orthogonal Attention
Improved Operator Learning by Orthogonal Attention
Zipeng Xiao
Zhongkai Hao
Bokai Lin
Zhijie Deng
Hang Su
128
21
0
19 Oct 2023
3D TransUNet: Advancing Medical Image Segmentation through Vision
  Transformers
3D TransUNet: Advancing Medical Image Segmentation through Vision Transformers
Jieneng Chen
Jieru Mei
Xianhang Li
Yongyi Lu
Qihang Yu
...
M. Lungren
Lei Xing
Le Lu
Alan Yuille
Yuyin Zhou
MedImViT
97
39
0
11 Oct 2023
Accelerating Vision Transformers Based on Heterogeneous Attention
  Patterns
Accelerating Vision Transformers Based on Heterogeneous Attention Patterns
Deli Yu
Teng Xi
Jianwei Li
Baopu Li
Gang Zhang
Haocheng Feng
Junyu Han
Jingtuo Liu
Errui Ding
Jingdong Wang
ViT
81
1
0
11 Oct 2023
Sound-skwatter (Did You Mean: Sound-squatter?) AI-powered Generator for
  Phishing Prevention
Sound-skwatter (Did You Mean: Sound-squatter?) AI-powered Generator for Phishing Prevention
R. Valentim
Idilio Drago
Marco Mellia
Federico Cerutti
25
1
0
10 Oct 2023
Anchor-Intermediate Detector: Decoupling and Coupling Bounding Boxes for
  Accurate Object Detection
Anchor-Intermediate Detector: Decoupling and Coupling Bounding Boxes for Accurate Object Detection
Yilong Lv
Min Li
Yujie He
Shaopeng Li
Zhuzhen He
Aitao Yang
45
1
0
09 Oct 2023
Efficient-VQGAN: Towards High-Resolution Image Generation with Efficient
  Vision Transformers
Efficient-VQGAN: Towards High-Resolution Image Generation with Efficient Vision Transformers
Shiyue Cao
Yueqin Yin
Lianghua Huang
Yu Liu
Xin Zhao
Deli Zhao
Kaiqi Huang
ViT
100
19
0
09 Oct 2023
Previous
123456...151617
Next