Breaking the Softmax Bottleneck: A High-Rank RNN Language Model

10 November 2017

Papers citing "Breaking the Softmax Bottleneck: A High-Rank RNN Language Model"

50 / 79 papers shown

Title
Distribution Prompting: Understanding the Expressivity of Language Models Through the Next-Token Distributions They Can Produce Haojin Wang Zining Zhu Freda Shi 12 0 0 18 May 2025
Implicit Geometry of Next-token Prediction: From Language Sparsity Patterns to Model Representations Yize Zhao Tina Behnia V. Vakilian Christos Thrampoulidis 68 9 0 20 Feb 2025
Decoding Game: On Minimax Optimality of Heuristic Text Generation Strategies Sijin Chen Omar Hagrass Jason M. Klusowski 32 3 0 04 Oct 2024
What Languages are Easy to Language-Model? A Perspective from Learning Probabilistic Regular Languages Nadav Borenstein Anej Svete R. Chan Josef Valvoda Franz Nowak Isabelle Augenstein Eleanor Chodroff Ryan Cotterell 42 12 0 06 Jun 2024
Linguistic Collapse: Neural Collapse in (Large) Language Models Robert Wu Vardan Papyan 48 12 0 28 May 2024
On the Independence Assumption in Neurosymbolic Learning Emile van Krieken Pasquale Minervini Edoardo Ponti Antonio Vergari 48 11 0 12 Apr 2024
Multi-Objective Evolutionary Neural Architecture Search for Recurrent Neural Networks Reinhard Booysen Anna Sergeevna Bosman 40 1 0 17 Mar 2024
Hallucination is Inevitable: An Innate Limitation of Large Language Models Ziwei Xu Sanjay Jain Mohan S. Kankanhalli HILM LRM 71 221 0 22 Jan 2024
Delving Deeper Into Astromorphic Transformers Md. Zesun Ahmed Mia Malyaban Bal Abhronil Sengupta 36 1 0 18 Dec 2023
Revisiting the Architectures like Pointer Networks to Efficiently Improve the Next Word Distribution, Summarization Factuality, and Beyond Haw-Shiuan Chang Zonghai Yao Alolika Gon Hong-ye Yu Andrew McCallum 46 10 0 20 May 2023
An Overview on Language Models: Recent Developments and Outlook Chengwei Wei Yun Cheng Wang Bin Wang C.-C. Jay Kuo 33 42 0 10 Mar 2023
Linear Spaces of Meanings: Compositional Structures in Vision-Language Models Matthew Trager Pramuditha Perera L. Zancato Alessandro Achille Parminder Bhatia Stefano Soatto CoGe 38 30 0 28 Feb 2023
LexLIP: Lexicon-Bottlenecked Language-Image Pre-Training for Large-Scale Image-Text Retrieval Ziyang Luo Pu Zhao Can Xu Xiubo Geng Tao Shen Chongyang Tao Jing Ma Qingwen Lin Daxin Jiang VLM CLIP 24 3 0 06 Feb 2023
Why do Nearest Neighbor Language Models Work? Frank F. Xu Uri Alon Graham Neubig RALM 30 21 0 07 Jan 2023
Training Integer-Only Deep Recurrent Neural Networks V. Nia Eyyub Sari Vanessa Courville M. Asgharian MQ 53 2 0 22 Dec 2022
Nonparametric Masked Language Modeling Sewon Min Weijia Shi M. Lewis Xilun Chen Wen-tau Yih Hannaneh Hajishirzi Luke Zettlemoyer RALM 50 48 0 02 Dec 2022
Context Variance Evaluation of Pretrained Language Models for Prompt-based Biomedical Knowledge Probing Zonghai Yao Yi Cao Zhichao Yang Hong-ye Yu 27 17 0 18 Nov 2022
Reconciliation of Pre-trained Models and Prototypical Neural Networks in Few-shot Named Entity Recognition Youcheng Huang Wenqiang Lei Jie Fu Jiancheng Lv 24 3 0 07 Nov 2022
Multi-CLS BERT: An Efficient Alternative to Traditional Ensembling Haw-Shiuan Chang Ruei-Yao Sun Kathryn Ricci Andrew McCallum 43 14 0 10 Oct 2022
Scaling Laws vs Model Architectures: How does Inductive Bias Influence Scaling? Yi Tay Mostafa Dehghani Samira Abnar Hyung Won Chung W. Fedus J. Rao Sharan Narang Vinh Q. Tran Dani Yogatama Donald Metzler AI4CE 34 100 0 21 Jul 2022
Paramixer: Parameterizing Mixing Links in Sparse Factors Works Better than Dot-Product Self-Attention Tong Yu Ruslan Khalitov Lei Cheng Zhirong Yang MoE 27 10 0 22 Apr 2022
Dependency-based Mixture Language Models Zhixian Yang Xiaojun Wan 49 2 0 19 Mar 2022
Low-Rank Softmax Can Have Unargmaxable Classes in Theory but Rarely in Practice Andreas Grivas Nikolay Bogoychev Adam Lopez 15 9 0 12 Mar 2022
Distributionally Robust Recurrent Decoders with Random Network Distillation Antonio Valerio Miceli Barone Alexandra Birch Rico Sennrich 39 1 0 25 Oct 2021
iRNN: Integer-only Recurrent Neural Network Eyyub Sari Vanessa Courville V. Nia MQ 56 4 0 20 Sep 2021
Rare Tokens Degenerate All Tokens: Improving Neural Text Generation via Adaptive Gradient Gating for Rare Token Embeddings Sangwon Yu Jongyoon Song Heeseung Kim SeongEun Lee Woo-Jong Ryu Sung-Hoon Yoon 19 31 0 07 Sep 2021
Mixed SIGNals: Sign Language Production via a Mixture of Motion Primitives Ben Saunders Necati Cihan Camgöz Richard Bowden SLR 27 50 0 23 Jul 2021
Combiner: Full Attention Transformer with Sparse Computation Cost Hongyu Ren H. Dai Zihang Dai Mengjiao Yang J. Leskovec Dale Schuurmans Bo Dai 87 77 0 12 Jul 2021
Which transformer architecture fits my data? A vocabulary bottleneck in self-attention Noam Wies Yoav Levine Daniel Jannai Amnon Shashua 40 20 0 09 May 2021
Learning Calibrated-Guidance for Object Detection in Aerial Images Zongqi Wei Dong Liang Dong-Ming Zhang Liyan Zhang Qixiang Geng Mingqiang Wei Huiyu Zhou 30 35 0 21 Mar 2021
The Rediscovery Hypothesis: Language Models Need to Meet Linguistics Vassilina Nikoulina Maxat Tezekbayev Nuradil Kozhakhmet Madina Babazhanova Matthias Gallé Z. Assylbekov 34 8 0 02 Mar 2021
On the Sentence Embeddings from Pre-trained Language Models Bohan Li Hao Zhou Junxian He Mingxuan Wang Yiming Yang Lei Li 30 213 0 02 Nov 2020
Medical Code Assignment with Gated Convolution and Note-Code Interaction Shaoxiong Ji Shirui Pan Pekka Marttinen MedIm 30 18 0 14 Oct 2020
Automated Source Code Generation and Auto-completion Using Deep Learning: Comparing and Discussing Current Language-Model-Related Approaches Juan Cruz-Benito Sanjay Vishwakarma Francisco Martín-Fernández Ismael Faro Ibm Quantum 22 30 0 16 Sep 2020
Temporal Convolutional Attention-based Network For Sequence Modeling Hongyan Hao Yan Wang Siqiao Xue Yudi Xia Jian Zhao S. Furao 30 41 0 28 Feb 2020
MaxUp: A Simple Way to Improve Generalization of Neural Network Training Chengyue Gong Tongzheng Ren Mao Ye Qiang Liu AAML 27 56 0 20 Feb 2020
Low-Rank Bottleneck in Multi-head Attention Models Srinadh Bhojanapalli Chulhee Yun A. S. Rawat Sashank J. Reddi Sanjiv Kumar 24 94 0 17 Feb 2020
Softmax-based Classification is k-means Clustering: Formal Proof, Consequences for Adversarial Attacks, and Improvement through Centroid Based Tailoring Sibylle Hess W. Duivesteijn Decebal Constantin Mocanu 20 12 0 07 Jan 2020
Paraphrase Generation with Latent Bag of Words Yao Fu Yansong Feng John P. Cunningham BDL 25 91 0 07 Jan 2020
Efficient Decoupled Neural Architecture Search by Structure and Operation Sampling Heung-Chang Lee Do-Guk Kim Bohyung Han 38 6 0 23 Oct 2019
Searching for A Robust Neural Architecture in Four GPU Hours Xuanyi Dong Yezhou Yang 20 646 0 10 Oct 2019
Improving Pre-Trained Multilingual Models with Vocabulary Expansion Hai Wang Dian Yu Kai Sun Jianshu Chen Dong Yu 30 41 0 26 Sep 2019
Alleviating Sequence Information Loss with Data Overlapping and Prime Batch Sizes Noémien Kocher Christian Scuito Lorenzo Tarantino Alexandros Lazaridis Andreas Fischer C. Musat 23 0 0 18 Sep 2019
Relaxed Softmax for learning from Positive and Unlabeled data Ugo Tanielian Flavian Vasile 18 9 0 17 Sep 2019
PaLM: A Hybrid Parser and Language Model Hao Peng Roy Schwartz Noah A. Smith AIMat 23 15 0 04 Sep 2019
Efficient Novelty-Driven Neural Architecture Search Miao Zhang Huiqi Li Shirui Pan Taoping Liu Steven W. Su 23 1 0 22 Jul 2019
ER-AE: Differentially Private Text Generation for Authorship Anonymization Haohan Bo Steven H. H. Ding Benjamin C. M. Fung Farkhund Iqbal DeLMO 39 38 0 20 Jul 2019
Evaluating Computational Language Models with Scaling Properties of Natural Language Shuntaro Takahashi Kumiko Tanaka-Ishii 16 23 0 22 Jun 2019
Barack's Wife Hillary: Using Knowledge-Graphs for Fact-Aware Language Modeling IV RobertL.Logan Nelson F. Liu Matthew E. Peters Matt Gardner Sameer Singh RALM 25 186 0 17 Jun 2019
Learning Representations by Maximizing Mutual Information Across Views Philip Bachman R. Devon Hjelm William Buchwalter SSL 72 1,455 0 03 Jun 2019