Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1711.03953
Cited By
Breaking the Softmax Bottleneck: A High-Rank RNN Language Model
10 November 2017
Zhilin Yang
Zihang Dai
Ruslan Salakhutdinov
William W. Cohen
BDL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Breaking the Softmax Bottleneck: A High-Rank RNN Language Model"
50 / 79 papers shown
Title
Distribution Prompting: Understanding the Expressivity of Language Models Through the Next-Token Distributions They Can Produce
Haojin Wang
Zining Zhu
Freda Shi
12
0
0
18 May 2025
Implicit Geometry of Next-token Prediction: From Language Sparsity Patterns to Model Representations
Yize Zhao
Tina Behnia
V. Vakilian
Christos Thrampoulidis
68
9
0
20 Feb 2025
Decoding Game: On Minimax Optimality of Heuristic Text Generation Strategies
Sijin Chen
Omar Hagrass
Jason M. Klusowski
32
3
0
04 Oct 2024
What Languages are Easy to Language-Model? A Perspective from Learning Probabilistic Regular Languages
Nadav Borenstein
Anej Svete
R. Chan
Josef Valvoda
Franz Nowak
Isabelle Augenstein
Eleanor Chodroff
Ryan Cotterell
42
12
0
06 Jun 2024
Linguistic Collapse: Neural Collapse in (Large) Language Models
Robert Wu
Vardan Papyan
48
12
0
28 May 2024
On the Independence Assumption in Neurosymbolic Learning
Emile van Krieken
Pasquale Minervini
Edoardo Ponti
Antonio Vergari
48
11
0
12 Apr 2024
Multi-Objective Evolutionary Neural Architecture Search for Recurrent Neural Networks
Reinhard Booysen
Anna Sergeevna Bosman
40
1
0
17 Mar 2024
Hallucination is Inevitable: An Innate Limitation of Large Language Models
Ziwei Xu
Sanjay Jain
Mohan S. Kankanhalli
HILM
LRM
71
221
0
22 Jan 2024
Delving Deeper Into Astromorphic Transformers
Md. Zesun Ahmed Mia
Malyaban Bal
Abhronil Sengupta
36
1
0
18 Dec 2023
Revisiting the Architectures like Pointer Networks to Efficiently Improve the Next Word Distribution, Summarization Factuality, and Beyond
Haw-Shiuan Chang
Zonghai Yao
Alolika Gon
Hong-ye Yu
Andrew McCallum
46
10
0
20 May 2023
An Overview on Language Models: Recent Developments and Outlook
Chengwei Wei
Yun Cheng Wang
Bin Wang
C.-C. Jay Kuo
33
42
0
10 Mar 2023
Linear Spaces of Meanings: Compositional Structures in Vision-Language Models
Matthew Trager
Pramuditha Perera
L. Zancato
Alessandro Achille
Parminder Bhatia
Stefano Soatto
CoGe
38
30
0
28 Feb 2023
LexLIP: Lexicon-Bottlenecked Language-Image Pre-Training for Large-Scale Image-Text Retrieval
Ziyang Luo
Pu Zhao
Can Xu
Xiubo Geng
Tao Shen
Chongyang Tao
Jing Ma
Qingwen Lin
Daxin Jiang
VLM
CLIP
24
3
0
06 Feb 2023
Why do Nearest Neighbor Language Models Work?
Frank F. Xu
Uri Alon
Graham Neubig
RALM
30
21
0
07 Jan 2023
Training Integer-Only Deep Recurrent Neural Networks
V. Nia
Eyyub Sari
Vanessa Courville
M. Asgharian
MQ
53
2
0
22 Dec 2022
Nonparametric Masked Language Modeling
Sewon Min
Weijia Shi
M. Lewis
Xilun Chen
Wen-tau Yih
Hannaneh Hajishirzi
Luke Zettlemoyer
RALM
50
48
0
02 Dec 2022
Context Variance Evaluation of Pretrained Language Models for Prompt-based Biomedical Knowledge Probing
Zonghai Yao
Yi Cao
Zhichao Yang
Hong-ye Yu
27
17
0
18 Nov 2022
Reconciliation of Pre-trained Models and Prototypical Neural Networks in Few-shot Named Entity Recognition
Youcheng Huang
Wenqiang Lei
Jie Fu
Jiancheng Lv
24
3
0
07 Nov 2022
Multi-CLS BERT: An Efficient Alternative to Traditional Ensembling
Haw-Shiuan Chang
Ruei-Yao Sun
Kathryn Ricci
Andrew McCallum
43
14
0
10 Oct 2022
Scaling Laws vs Model Architectures: How does Inductive Bias Influence Scaling?
Yi Tay
Mostafa Dehghani
Samira Abnar
Hyung Won Chung
W. Fedus
J. Rao
Sharan Narang
Vinh Q. Tran
Dani Yogatama
Donald Metzler
AI4CE
34
100
0
21 Jul 2022
Paramixer: Parameterizing Mixing Links in Sparse Factors Works Better than Dot-Product Self-Attention
Tong Yu
Ruslan Khalitov
Lei Cheng
Zhirong Yang
MoE
27
10
0
22 Apr 2022
Dependency-based Mixture Language Models
Zhixian Yang
Xiaojun Wan
49
2
0
19 Mar 2022
Low-Rank Softmax Can Have Unargmaxable Classes in Theory but Rarely in Practice
Andreas Grivas
Nikolay Bogoychev
Adam Lopez
15
9
0
12 Mar 2022
Distributionally Robust Recurrent Decoders with Random Network Distillation
Antonio Valerio Miceli Barone
Alexandra Birch
Rico Sennrich
39
1
0
25 Oct 2021
iRNN: Integer-only Recurrent Neural Network
Eyyub Sari
Vanessa Courville
V. Nia
MQ
56
4
0
20 Sep 2021
Rare Tokens Degenerate All Tokens: Improving Neural Text Generation via Adaptive Gradient Gating for Rare Token Embeddings
Sangwon Yu
Jongyoon Song
Heeseung Kim
SeongEun Lee
Woo-Jong Ryu
Sung-Hoon Yoon
19
31
0
07 Sep 2021
Mixed SIGNals: Sign Language Production via a Mixture of Motion Primitives
Ben Saunders
Necati Cihan Camgöz
Richard Bowden
SLR
27
50
0
23 Jul 2021
Combiner: Full Attention Transformer with Sparse Computation Cost
Hongyu Ren
H. Dai
Zihang Dai
Mengjiao Yang
J. Leskovec
Dale Schuurmans
Bo Dai
87
77
0
12 Jul 2021
Which transformer architecture fits my data? A vocabulary bottleneck in self-attention
Noam Wies
Yoav Levine
Daniel Jannai
Amnon Shashua
40
20
0
09 May 2021
Learning Calibrated-Guidance for Object Detection in Aerial Images
Zongqi Wei
Dong Liang
Dong-Ming Zhang
Liyan Zhang
Qixiang Geng
Mingqiang Wei
Huiyu Zhou
30
35
0
21 Mar 2021
The Rediscovery Hypothesis: Language Models Need to Meet Linguistics
Vassilina Nikoulina
Maxat Tezekbayev
Nuradil Kozhakhmet
Madina Babazhanova
Matthias Gallé
Z. Assylbekov
34
8
0
02 Mar 2021
On the Sentence Embeddings from Pre-trained Language Models
Bohan Li
Hao Zhou
Junxian He
Mingxuan Wang
Yiming Yang
Lei Li
30
213
0
02 Nov 2020
Medical Code Assignment with Gated Convolution and Note-Code Interaction
Shaoxiong Ji
Shirui Pan
Pekka Marttinen
MedIm
30
18
0
14 Oct 2020
Automated Source Code Generation and Auto-completion Using Deep Learning: Comparing and Discussing Current Language-Model-Related Approaches
Juan Cruz-Benito
Sanjay Vishwakarma
Francisco Martín-Fernández
Ismael Faro Ibm Quantum
22
30
0
16 Sep 2020
Temporal Convolutional Attention-based Network For Sequence Modeling
Hongyan Hao
Yan Wang
Siqiao Xue
Yudi Xia
Jian Zhao
S. Furao
30
41
0
28 Feb 2020
MaxUp: A Simple Way to Improve Generalization of Neural Network Training
Chengyue Gong
Tongzheng Ren
Mao Ye
Qiang Liu
AAML
27
56
0
20 Feb 2020
Low-Rank Bottleneck in Multi-head Attention Models
Srinadh Bhojanapalli
Chulhee Yun
A. S. Rawat
Sashank J. Reddi
Sanjiv Kumar
24
94
0
17 Feb 2020
Softmax-based Classification is k-means Clustering: Formal Proof, Consequences for Adversarial Attacks, and Improvement through Centroid Based Tailoring
Sibylle Hess
W. Duivesteijn
Decebal Constantin Mocanu
20
12
0
07 Jan 2020
Paraphrase Generation with Latent Bag of Words
Yao Fu
Yansong Feng
John P. Cunningham
BDL
25
91
0
07 Jan 2020
Efficient Decoupled Neural Architecture Search by Structure and Operation Sampling
Heung-Chang Lee
Do-Guk Kim
Bohyung Han
38
6
0
23 Oct 2019
Searching for A Robust Neural Architecture in Four GPU Hours
Xuanyi Dong
Yezhou Yang
20
646
0
10 Oct 2019
Improving Pre-Trained Multilingual Models with Vocabulary Expansion
Hai Wang
Dian Yu
Kai Sun
Jianshu Chen
Dong Yu
30
41
0
26 Sep 2019
Alleviating Sequence Information Loss with Data Overlapping and Prime Batch Sizes
Noémien Kocher
Christian Scuito
Lorenzo Tarantino
Alexandros Lazaridis
Andreas Fischer
C. Musat
23
0
0
18 Sep 2019
Relaxed Softmax for learning from Positive and Unlabeled data
Ugo Tanielian
Flavian Vasile
18
9
0
17 Sep 2019
PaLM: A Hybrid Parser and Language Model
Hao Peng
Roy Schwartz
Noah A. Smith
AIMat
23
15
0
04 Sep 2019
Efficient Novelty-Driven Neural Architecture Search
Miao Zhang
Huiqi Li
Shirui Pan
Taoping Liu
Steven W. Su
23
1
0
22 Jul 2019
ER-AE: Differentially Private Text Generation for Authorship Anonymization
Haohan Bo
Steven H. H. Ding
Benjamin C. M. Fung
Farkhund Iqbal
DeLMO
39
38
0
20 Jul 2019
Evaluating Computational Language Models with Scaling Properties of Natural Language
Shuntaro Takahashi
Kumiko Tanaka-Ishii
16
23
0
22 Jun 2019
Barack's Wife Hillary: Using Knowledge-Graphs for Fact-Aware Language Modeling
IV RobertL.Logan
Nelson F. Liu
Matthew E. Peters
Matt Gardner
Sameer Singh
RALM
25
186
0
17 Jun 2019
Learning Representations by Maximizing Mutual Information Across Views
Philip Bachman
R. Devon Hjelm
William Buchwalter
SSL
72
1,455
0
03 Jun 2019
1
2
Next