ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2302.06461
  4. Cited By
A Study on ReLU and Softmax in Transformer

A Study on ReLU and Softmax in Transformer

13 February 2023
Kai Shen
Junliang Guo
Xuejiao Tan
Siliang Tang
Rui Wang
Jiang Bian
ArXivPDFHTML

Papers citing "A Study on ReLU and Softmax in Transformer"

33 / 33 papers shown
Title
Revisiting Transformers through the Lens of Low Entropy and Dynamic Sparsity
Revisiting Transformers through the Lens of Low Entropy and Dynamic Sparsity
Ruifeng Ren
Yong Liu
132
0
0
26 Apr 2025
Quantum Doubly Stochastic Transformers
Quantum Doubly Stochastic Transformers
Jannis Born
Filip Skogh
Kahn Rhrissorrakrai
Filippo Utro
Nico Wagner
Aleksandros Sobczyk
27
0
0
22 Apr 2025
Process Reward Modeling with Entropy-Driven Uncertainty
Process Reward Modeling with Entropy-Driven Uncertainty
Lang Cao
Renhong Chen
Yingtian Zou
Chao Peng
Wu Ning
...
Yixuan Wang
Peishuo Su
Mofan Peng
Zijie Chen
Yitong Li
34
0
0
28 Mar 2025
InhibiDistilbert: Knowledge Distillation for a ReLU and Addition-based Transformer
InhibiDistilbert: Knowledge Distillation for a ReLU and Addition-based Transformer
Tony Zhang
Rickard Brännvall
41
0
0
20 Mar 2025
A Compact Model for Large-Scale Time Series Forecasting
A Compact Model for Large-Scale Time Series Forecasting
Chin-Chia Michael Yeh
Xiran Fan
Zhimeng Jiang
Yujie Fan
Huiyuan Chen
...
Xin Dai
J. Wang
Zhongfang Zhuang
Liang Wang
Yan Zheng
AI4TS
43
0
0
28 Feb 2025
Self-Adjust Softmax
Self-Adjust Softmax
Chuanyang Zheng
Yihang Gao
Guoxuan Chen
Han Shi
Jing Xiong
Xiaozhe Ren
Chao Huang
Xin Jiang
Z. Li
Yu-Hu Li
50
0
0
25 Feb 2025
On Space Folds of ReLU Neural Networks
On Space Folds of ReLU Neural Networks
Michal Lewandowski
Hamid Eghbalzadeh
Bernhard Heinzl
Raphael Pisoni
Bernhard A.Moser
MLT
84
1
0
17 Feb 2025
ZETA: Leveraging Z-order Curves for Efficient Top-k Attention
ZETA: Leveraging Z-order Curves for Efficient Top-k Attention
Qiuhao Zeng
Jerry Huang
Peng Lu
Gezheng Xu
Boxing Chen
Charles Ling
Boyu Wang
49
1
0
24 Jan 2025
More Expressive Attention with Negative Weights
More Expressive Attention with Negative Weights
Ang Lv
Ruobing Xie
Shuaipeng Li
Jiayi Liao
Xingchen Sun
Zhanhui Kang
Di Wang
Rui Yan
42
0
0
11 Nov 2024
Rethinking Attention: Polynomial Alternatives to Softmax in Transformers
Rethinking Attention: Polynomial Alternatives to Softmax in Transformers
Hemanth Saratchandran
Jianqiao Zheng
Yiping Ji
Wenbo Zhang
Simon Lucey
31
4
0
24 Oct 2024
HSR-Enhanced Sparse Attention Acceleration
HSR-Enhanced Sparse Attention Acceleration
Bo Chen
Yingyu Liang
Zhizhou Sha
Zhenmei Shi
Zhao-quan Song
95
18
0
14 Oct 2024
Improving Image Clustering with Artifacts Attenuation via Inference-Time
  Attention Engineering
Improving Image Clustering with Artifacts Attenuation via Inference-Time Attention Engineering
Kazumoto Nakamura
Yuji Nozawa
Yu-Chieh Lin
K. Nakata
Youyang Ng
ViT
35
1
0
07 Oct 2024
Attention layers provably solve single-location regression
Attention layers provably solve single-location regression
P. Marion
Raphael Berthier
Gérard Biau
Claire Boyer
140
2
0
02 Oct 2024
Sampling Foundational Transformer: A Theoretical Perspective
Sampling Foundational Transformer: A Theoretical Perspective
Viet Anh Nguyen
Minh Lenhat
Khoa Nguyen
Duong Duc Hieu
Dao Huu Hung
Truong Son-Hy
44
0
0
11 Aug 2024
A Non-negative VAE:the Generalized Gamma Belief Network
A Non-negative VAE:the Generalized Gamma Belief Network
Zhibin Duan
Tiansheng Wen
Muyao Wang
Bo Chen
Mingyuan Zhou
BDL
32
1
0
06 Aug 2024
Learning Neural Networks with Sparse Activations
Learning Neural Networks with Sparse Activations
Pranjal Awasthi
Nishanth Dikkala
Pritish Kamath
Raghu Meka
38
2
0
26 Jun 2024
Optimized Speculative Sampling for GPU Hardware Accelerators
Optimized Speculative Sampling for GPU Hardware Accelerators
Dominik Wagner
Seanie Lee
Ilja Baumann
Philipp Seeberger
K. Riedhammer
Tobias Bocklet
48
3
0
16 Jun 2024
TransFusion: Contrastive Learning with Transformers
TransFusion: Contrastive Learning with Transformers
Huanran Li
Daniel Pimentel-Alarcón
42
0
0
27 Mar 2024
Frequency Decoupling for Motion Magnification via Multi-Level Isomorphic
  Architecture
Frequency Decoupling for Motion Magnification via Multi-Level Isomorphic Architecture
Fei Wang
Dan Guo
Kun Li
Zhun Zhong
Mengqing Wang
42
16
0
12 Mar 2024
ProSparse: Introducing and Enhancing Intrinsic Activation Sparsity
  within Large Language Models
ProSparse: Introducing and Enhancing Intrinsic Activation Sparsity within Large Language Models
Chenyang Song
Xu Han
Zhengyan Zhang
Shengding Hu
Xiyu Shi
...
Chen Chen
Zhiyuan Liu
Guanglin Li
Tao Yang
Maosong Sun
53
24
0
21 Feb 2024
On Provable Length and Compositional Generalization
On Provable Length and Compositional Generalization
Kartik Ahuja
Amin Mansouri
OODD
41
7
0
07 Feb 2024
Unification of Symmetries Inside Neural Networks: Transformer,
  Feedforward and Neural ODE
Unification of Symmetries Inside Neural Networks: Transformer, Feedforward and Neural ODE
Koji Hashimoto
Yuji Hirono
Akiyoshi Sannai
AI4CE
40
7
0
04 Feb 2024
Beyond the Limits: A Survey of Techniques to Extend the Context Length
  in Large Language Models
Beyond the Limits: A Survey of Techniques to Extend the Context Length in Large Language Models
Xindi Wang
Mahsa Salmani
Parsa Omidi
Xiangyu Ren
Mehdi Rezagholizadeh
A. Eshaghi
LRM
34
35
0
03 Feb 2024
Improving fine-grained understanding in image-text pre-training
Improving fine-grained understanding in image-text pre-training
Ioana Bica
Anastasija Ilić
Matthias Bauer
Goker Erdogan
Matko Bovsnjak
...
A. Gritsenko
Matthias Minderer
Charles Blundell
Razvan Pascanu
Jovana Mitrović
VLM
25
22
0
18 Jan 2024
Eureka-Moments in Transformers: Multi-Step Tasks Reveal Softmax Induced
  Optimization Problems
Eureka-Moments in Transformers: Multi-Step Tasks Reveal Softmax Induced Optimization Problems
David T. Hoffmann
Simon Schrodi
Jelena Bratulić
Nadine Behrmann
Volker Fischer
Thomas Brox
30
5
0
19 Oct 2023
Approximating Two-Layer Feedforward Networks for Efficient Transformers
Approximating Two-Layer Feedforward Networks for Efficient Transformers
Róbert Csordás
Kazuki Irie
Jürgen Schmidhuber
MoE
24
18
0
16 Oct 2023
How Do Transformers Learn In-Context Beyond Simple Functions? A Case
  Study on Learning with Representations
How Do Transformers Learn In-Context Beyond Simple Functions? A Case Study on Learning with Representations
Tianyu Guo
Wei Hu
Song Mei
Huan Wang
Caiming Xiong
Silvio Savarese
Yu Bai
27
47
0
16 Oct 2023
Transformers as Decision Makers: Provable In-Context Reinforcement
  Learning via Supervised Pretraining
Transformers as Decision Makers: Provable In-Context Reinforcement Learning via Supervised Pretraining
Licong Lin
Yu Bai
Song Mei
OffRL
32
43
0
12 Oct 2023
ReLU Strikes Back: Exploiting Activation Sparsity in Large Language
  Models
ReLU Strikes Back: Exploiting Activation Sparsity in Large Language Models
Iman Mirzadeh
Keivan Alizadeh-Vahid
Sachin Mehta
C. C. D. Mundo
Oncel Tuzel
Golnoosh Samei
Mohammad Rastegari
Mehrdad Farajtabar
126
60
0
06 Oct 2023
The Inhibitor: ReLU and Addition-Based Attention for Efficient
  Transformers
The Inhibitor: ReLU and Addition-Based Attention for Efficient Transformers
Rickard Brannvall
21
0
0
03 Oct 2023
Replacing softmax with ReLU in Vision Transformers
Replacing softmax with ReLU in Vision Transformers
Mitchell Wortsman
Jaehoon Lee
Justin Gilmer
Simon Kornblith
ViT
30
33
0
15 Sep 2023
What can a Single Attention Layer Learn? A Study Through the Random
  Features Lens
What can a Single Attention Layer Learn? A Study Through the Random Features Lens
Hengyu Fu
Tianyu Guo
Yu Bai
Song Mei
MLT
35
22
0
21 Jul 2023
Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis
Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis
Ziyue Jiang
Jinglin Liu
Yi Ren
Jinzheng He
Zhe Ye
...
Pengfei Wei
Chunfeng Wang
Xiang Yin
Zejun Ma
Zhou Zhao
35
44
0
14 Jul 2023
1