Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2203.16634
Cited By
Transformer Language Models without Positional Encodings Still Learn Positional Information
30 March 2022
Adi Haviv
Ori Ram
Ofir Press
Peter Izsak
Omer Levy
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Transformer Language Models without Positional Encodings Still Learn Positional Information"
29 / 29 papers shown
Title
Jekyll-and-Hyde Tipping Point in an AI's Behavior
Neil F. Johnson
Frank Yingjie Huo
46
0
0
29 Apr 2025
Language Models, Graph Searching, and Supervision Adulteration: When More Supervision is Less and How to Make More More
Arvid Frydenlund
LRM
55
0
0
13 Mar 2025
Evidence-Driven Marker Extraction for Social Media Suicide Risk Detection
Carter Adams
Caleb Carter
Jackson Simmons
60
0
0
26 Feb 2025
Number Cookbook: Number Understanding of Language Models and How to Improve It
Haotong Yang
Yi Hu
Shijia Kang
Zhouchen Lin
Muhan Zhang
LRM
46
2
0
06 Nov 2024
Scaling Stick-Breaking Attention: An Efficient Implementation and In-depth Study
Shawn Tan
Songlin Yang
Aaron Courville
Rameswar Panda
Yikang Shen
30
4
0
23 Oct 2024
The Mystery of the Pathological Path-star Task for Language Models
Arvid Frydenlund
LRM
27
4
0
17 Oct 2024
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers
Enze Xie
Junsong Chen
Junyu Chen
Han Cai
Haotian Tang
...
Zhekai Zhang
Muyang Li
Ligeng Zhu
Yunfan LU
Song Han
VLM
46
51
0
14 Oct 2024
Round and Round We Go! What makes Rotary Positional Encodings useful?
Federico Barbero
Alex Vitvitskyi
Christos Perivolaropoulos
Razvan Pascanu
Petar Velickovic
83
19
0
08 Oct 2024
What Information Contributes to Log-based Anomaly Detection? Insights from a Configurable Transformer-Based Approach
Xingfang Wu
Heng Li
Foutse Khomh
AI4TS
32
0
0
30 Sep 2024
Representing Rule-based Chatbots with Transformers
Dan Friedman
Abhishek Panigrahi
Danqi Chen
71
1
0
15 Jul 2024
Teaching Transformers Causal Reasoning through Axiomatic Training
Aniket Vashishtha
Abhinav Kumar
Abbavaram Gowtham Reddy
Vineeth N. Balasubramanian
Amit Sharma
Vineeth N Balasubramanian
Amit Sharma
42
2
0
10 Jul 2024
Multimodal Structured Generation: CVPR's 2nd MMFM Challenge Technical Report
Franz Louis Cesista
VGen
52
6
0
17 Jun 2024
Mitigate Position Bias in Large Language Models via Scaling a Single Dimension
Yijiong Yu
Huiqiang Jiang
Xufang Luo
Qianhui Wu
Chin-Yew Lin
Dongsheng Li
Yuqing Yang
Yongfeng Huang
L. Qiu
50
9
0
04 Jun 2024
A Morphology-Based Investigation of Positional Encodings
Poulami Ghosh
Shikhar Vashishth
Raj Dabre
Pushpak Bhattacharyya
34
1
0
06 Apr 2024
Breaking Symmetry When Training Transformers
Chunsheng Zuo
Michael Guerzhoy
30
0
0
06 Feb 2024
Positional Information Matters for Invariant In-Context Learning: A Case Study of Simple Function Classes
Yongqiang Chen
Binghui Xie
Kaiwen Zhou
Bo Han
Yatao Bian
James Cheng
35
3
0
30 Nov 2023
Language model acceptability judgements are not always robust to context
Koustuv Sinha
Jon Gauthier
Aaron Mueller
Kanishka Misra
Keren Fuentes
R. Levy
Adina Williams
21
17
0
18 Dec 2022
Word Order Matters when you Increase Masking
Karim Lasri
Alessandro Lenci
Thierry Poibeau
36
7
0
08 Nov 2022
What Language Model to Train if You Have One Million GPU Hours?
Teven Le Scao
Thomas Wang
Daniel Hesslow
Lucile Saulnier
Stas Bekman
...
Lintang Sutawika
Jaesung Tae
Zheng-Xin Yong
Julien Launay
Iz Beltagy
MoE
AI4CE
230
103
0
27 Oct 2022
The Curious Case of Absolute Position Embeddings
Koustuv Sinha
Amirhossein Kazemnejad
Siva Reddy
J. Pineau
Dieuwke Hupkes
Adina Williams
87
15
0
23 Oct 2022
Transformers Learn Shortcuts to Automata
Bingbin Liu
Jordan T. Ash
Surbhi Goel
A. Krishnamurthy
Cyril Zhang
OffRL
LRM
48
156
0
19 Oct 2022
Transparency Helps Reveal When Language Models Learn Meaning
Zhaofeng Wu
William Merrill
Hao Peng
Iz Beltagy
Noah A. Smith
19
9
0
14 Oct 2022
Achieving and Understanding Out-of-Distribution Generalization in Systematic Reasoning in Small-Scale Transformers
A. Nam
Mustafa Abdool
Trevor C. Maxfield
James L. McClelland
NAI
LRM
AI4CE
28
1
0
07 Oct 2022
Systematic Generalization and Emergent Structures in Transformers Trained on Structured Tasks
Yuxuan Li
James L. McClelland
44
17
0
02 Oct 2022
Fast-FNet: Accelerating Transformer Encoder Models via Efficient Fourier Layers
Nurullah Sevim
Ege Ozan Özyedek
Furkan Şahinuç
Aykut Koç
35
11
0
26 Sep 2022
Flamingo: a Visual Language Model for Few-Shot Learning
Jean-Baptiste Alayrac
Jeff Donahue
Pauline Luc
Antoine Miech
Iain Barr
...
Mikolaj Binkowski
Ricardo Barreira
Oriol Vinyals
Andrew Zisserman
Karen Simonyan
MLLM
VLM
46
3,349
0
29 Apr 2022
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation
Ofir Press
Noah A. Smith
M. Lewis
253
698
0
27 Aug 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
282
1,996
0
31 Dec 2020
The Bottom-up Evolution of Representations in the Transformer: A Study with Machine Translation and Language Modeling Objectives
Elena Voita
Rico Sennrich
Ivan Titov
207
181
0
03 Sep 2019
1