Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2302.06461
Cited By
A Study on ReLU and Softmax in Transformer
13 February 2023
Kai Shen
Junliang Guo
Xuejiao Tan
Siliang Tang
Rui Wang
Jiang Bian
Re-assign community
ArXiv
PDF
HTML
Papers citing
"A Study on ReLU and Softmax in Transformer"
33 / 33 papers shown
Title
Revisiting Transformers through the Lens of Low Entropy and Dynamic Sparsity
Ruifeng Ren
Yong Liu
156
0
0
26 Apr 2025
Quantum Doubly Stochastic Transformers
Jannis Born
Filip Skogh
Kahn Rhrissorrakrai
Filippo Utro
Nico Wagner
Aleksandros Sobczyk
27
0
0
22 Apr 2025
Process Reward Modeling with Entropy-Driven Uncertainty
Lang Cao
Renhong Chen
Yingtian Zou
Chao Peng
Wu Ning
...
Yansen Wang
Peishuo Su
Mofan Peng
Zijie Chen
Yitong Li
36
0
0
28 Mar 2025
InhibiDistilbert: Knowledge Distillation for a ReLU and Addition-based Transformer
Tony Zhang
Rickard Brännvall
43
0
0
20 Mar 2025
A Compact Model for Large-Scale Time Series Forecasting
Chin-Chia Michael Yeh
Xiran Fan
Zhimeng Jiang
Yujie Fan
Huiyuan Chen
...
Xin Dai
Jiadong Wang
Zhongfang Zhuang
Liang Wang
Yan Zheng
AI4TS
43
0
0
28 Feb 2025
Self-Adjust Softmax
Chuanyang Zheng
Yihang Gao
Guoxuan Chen
Han Shi
Jing Xiong
Xiaozhe Ren
Chao Huang
Xin Jiang
Zhiyu Li
Yu Li
50
0
0
25 Feb 2025
On Space Folds of ReLU Neural Networks
Michal Lewandowski
Hamid Eghbalzadeh
Bernhard Heinzl
Raphael Pisoni
Bernhard A.Moser
MLT
87
1
0
17 Feb 2025
ZETA: Leveraging Z-order Curves for Efficient Top-k Attention
Qiuhao Zeng
Jerry Huang
Peng Lu
Gezheng Xu
Boxing Chen
Charles Ling
Boyu Wang
51
1
0
24 Jan 2025
More Expressive Attention with Negative Weights
Ang Lv
Ruobing Xie
Shuaipeng Li
Jiayi Liao
Xingchen Sun
Zhanhui Kang
Di Wang
Rui Yan
42
0
0
11 Nov 2024
Rethinking Attention: Polynomial Alternatives to Softmax in Transformers
Hemanth Saratchandran
Jianqiao Zheng
Yiping Ji
Wenbo Zhang
Simon Lucey
31
4
0
24 Oct 2024
HSR-Enhanced Sparse Attention Acceleration
Bo Chen
Yingyu Liang
Zhizhou Sha
Zhenmei Shi
Zhao Song
95
19
0
14 Oct 2024
Improving Image Clustering with Artifacts Attenuation via Inference-Time Attention Engineering
Kazumoto Nakamura
Yuji Nozawa
Yu-Chieh Lin
K. Nakata
Youyang Ng
ViT
35
1
0
07 Oct 2024
Attention layers provably solve single-location regression
P. Marion
Raphael Berthier
Gérard Biau
Claire Boyer
161
2
0
02 Oct 2024
Sampling Foundational Transformer: A Theoretical Perspective
Viet Anh Nguyen
Minh Lenhat
Khoa Nguyen
Duong Duc Hieu
Dao Huu Hung
Truong-Son Hy
46
0
0
11 Aug 2024
A Non-negative VAE:the Generalized Gamma Belief Network
Zhibin Duan
Tiansheng Wen
Muyao Wang
Bo Chen
Mingyuan Zhou
BDL
35
1
0
06 Aug 2024
Learning Neural Networks with Sparse Activations
Pranjal Awasthi
Nishanth Dikkala
Pritish Kamath
Raghu Meka
41
2
0
26 Jun 2024
Optimized Speculative Sampling for GPU Hardware Accelerators
Dominik Wagner
Seanie Lee
Ilja Baumann
Philipp Seeberger
Korbinian Riedhammer
Tobias Bocklet
48
3
0
16 Jun 2024
TransFusion: Contrastive Learning with Transformers
Huanran Li
Daniel Pimentel-Alarcón
42
0
0
27 Mar 2024
Frequency Decoupling for Motion Magnification via Multi-Level Isomorphic Architecture
Fei Wang
Dan Guo
Kun Li
Zhun Zhong
Mengqing Wang
42
16
0
12 Mar 2024
ProSparse: Introducing and Enhancing Intrinsic Activation Sparsity within Large Language Models
Chenyang Song
Xu Han
Zhengyan Zhang
Shengding Hu
Xiyu Shi
...
Chen Chen
Zhiyuan Liu
Guanglin Li
Tao Yang
Maosong Sun
53
24
0
21 Feb 2024
On Provable Length and Compositional Generalization
Kartik Ahuja
Amin Mansouri
OODD
41
7
0
07 Feb 2024
Unification of Symmetries Inside Neural Networks: Transformer, Feedforward and Neural ODE
Koji Hashimoto
Yuji Hirono
Akiyoshi Sannai
AI4CE
42
7
0
04 Feb 2024
Beyond the Limits: A Survey of Techniques to Extend the Context Length in Large Language Models
Xindi Wang
Mahsa Salmani
Parsa Omidi
Xiangyu Ren
Mehdi Rezagholizadeh
A. Eshaghi
LRM
34
35
0
03 Feb 2024
Improving fine-grained understanding in image-text pre-training
Ioana Bica
Anastasija Ilić
Matthias Bauer
Goker Erdogan
Matko Bovsnjak
...
A. Gritsenko
Matthias Minderer
Charles Blundell
Razvan Pascanu
Jovana Mitrović
VLM
25
22
0
18 Jan 2024
Eureka-Moments in Transformers: Multi-Step Tasks Reveal Softmax Induced Optimization Problems
David T. Hoffmann
Simon Schrodi
Jelena Bratulić
Nadine Behrmann
Volker Fischer
Thomas Brox
32
5
0
19 Oct 2023
Approximating Two-Layer Feedforward Networks for Efficient Transformers
Róbert Csordás
Kazuki Irie
Jürgen Schmidhuber
MoE
24
18
0
16 Oct 2023
How Do Transformers Learn In-Context Beyond Simple Functions? A Case Study on Learning with Representations
Tianyu Guo
Wei Hu
Song Mei
Huan Wang
Caiming Xiong
Silvio Savarese
Yu Bai
27
48
0
16 Oct 2023
Transformers as Decision Makers: Provable In-Context Reinforcement Learning via Supervised Pretraining
Licong Lin
Yu Bai
Song Mei
OffRL
32
43
0
12 Oct 2023
ReLU Strikes Back: Exploiting Activation Sparsity in Large Language Models
Iman Mirzadeh
Keivan Alizadeh-Vahid
Sachin Mehta
C. C. D. Mundo
Oncel Tuzel
Golnoosh Samei
Mohammad Rastegari
Mehrdad Farajtabar
126
60
0
06 Oct 2023
The Inhibitor: ReLU and Addition-Based Attention for Efficient Transformers
Rickard Brannvall
24
0
0
03 Oct 2023
Replacing softmax with ReLU in Vision Transformers
Mitchell Wortsman
Jaehoon Lee
Justin Gilmer
Simon Kornblith
ViT
30
33
0
15 Sep 2023
What can a Single Attention Layer Learn? A Study Through the Random Features Lens
Hengyu Fu
Tianyu Guo
Yu Bai
Song Mei
MLT
35
22
0
21 Jul 2023
Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis
Ziyue Jiang
Jinglin Liu
Yi Ren
Jinzheng He
Zhe Ye
...
Pengfei Wei
Chunfeng Wang
Xiang Yin
Zejun Ma
Zhou Zhao
40
44
0
14 Jul 2023
1