ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.06925
  4. Cited By
Transformers Learn Low Sensitivity Functions: Investigations and Implications

Transformers Learn Low Sensitivity Functions: Investigations and Implications

11 March 2024
Bhavya Vasudeva
Deqing Fu
Tianyi Zhou
Elliott Kau
Youqi Huang
Vatsal Sharan
ArXivPDFHTML

Papers citing "Transformers Learn Low Sensitivity Functions: Investigations and Implications"

8 / 8 papers shown
Title
Towards Understanding the Word Sensitivity of Attention Layers: A Study
  via Random Features
Towards Understanding the Word Sensitivity of Attention Layers: A Study via Random Features
Simone Bombari
Marco Mondelli
46
4
0
05 Feb 2024
Improving the Robustness of Transformer-based Large Language Models with
  Dynamic Attention
Improving the Robustness of Transformer-based Large Language Models with Dynamic Attention
Lujia Shen
Yuwen Pu
Shouling Ji
Changjiang Li
Xuhong Zhang
Chunpeng Ge
Ting Wang
AAML
34
4
0
29 Nov 2023
How does GPT-2 compute greater-than?: Interpreting mathematical
  abilities in a pre-trained language model
How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language model
Michael Hanna
Ollie Liu
Alexandre Variengien
LRM
212
123
0
30 Apr 2023
Interpretability in the Wild: a Circuit for Indirect Object
  Identification in GPT-2 small
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small
Kevin Wang
Alexandre Variengien
Arthur Conmy
Buck Shlegeris
Jacob Steinhardt
221
515
0
01 Nov 2022
Patches Are All You Need?
Patches Are All You Need?
Asher Trockman
J. Zico Kolter
ViT
225
403
0
24 Jan 2022
Intriguing Properties of Vision Transformers
Intriguing Properties of Vision Transformers
Muzammal Naseer
Kanchana Ranasinghe
Salman Khan
Munawar Hayat
Fahad Shahbaz Khan
Ming-Hsuan Yang
ViT
265
626
0
21 May 2021
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
  Minima
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
312
2,896
0
15 Sep 2016
Densely Connected Convolutional Networks
Densely Connected Convolutional Networks
Gao Huang
Zhuang Liu
Laurens van der Maaten
Kilian Q. Weinberger
PINN
3DV
348
36,437
0
25 Aug 2016
1