ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.05969
  4. Cited By
Breaking Symmetry When Training Transformers
v1v2 (latest)

Breaking Symmetry When Training Transformers

6 February 2024
Chunsheng Zuo
Michael Guerzhoy
ArXiv (abs)PDFHTML

Papers citing "Breaking Symmetry When Training Transformers"

5 / 5 papers shown
Title
Transformer Language Models without Positional Encodings Still Learn
  Positional Information
Transformer Language Models without Positional Encodings Still Learn Positional Information
Adi Haviv
Ori Ram
Ofir Press
Peter Izsak
Omer Levy
104
128
0
30 Mar 2022
Attention is Not All You Need: Pure Attention Loses Rank Doubly
  Exponentially with Depth
Attention is Not All You Need: Pure Attention Loses Rank Doubly Exponentially with Depth
Yihe Dong
Jean-Baptiste Cordonnier
Andreas Loukas
147
388
0
05 Mar 2021
Designing and Interpreting Probes with Control Tasks
Designing and Interpreting Probes with Control Tasks
John Hewitt
Percy Liang
86
538
0
08 Sep 2019
Transformer Dissection: A Unified Understanding of Transformer's
  Attention via the Lens of Kernel
Transformer Dissection: A Unified Understanding of Transformer's Attention via the Lens of Kernel
Yao-Hung Hubert Tsai
Shaojie Bai
M. Yamada
Louis-Philippe Morency
Ruslan Salakhutdinov
129
261
0
30 Aug 2019
Neural Machine Translation by Jointly Learning to Align and Translate
Neural Machine Translation by Jointly Learning to Align and Translate
Dzmitry Bahdanau
Kyunghyun Cho
Yoshua Bengio
AIMat
580
27,338
0
01 Sep 2014
1