Stolen Probability: A Structural Weakness of Neural Language Models

5 May 2020

Papers citing "Stolen Probability: A Structural Weakness of Neural Language Models"

7 / 7 papers shown

Title
Norm of Mean Contextualized Embeddings Determines their Variance Hiroaki Yamagiwa Hidetoshi Shimodaira 27 0 0 17 Sep 2024
LABO: Towards Learning Optimal Label Regularization via Bi-level Optimization Peng Lu Ahmad Rashid I. Kobyzev Mehdi Rezagholizadeh Philippe Langlais 13 0 0 08 May 2023
Why do Nearest Neighbor Language Models Work? Frank F. Xu Uri Alon Graham Neubig RALM 30 22 0 07 Jan 2023
Low-Rank Softmax Can Have Unargmaxable Classes in Theory but Rarely in Practice Andreas Grivas Nikolay Bogoychev Adam Lopez 17 9 0 12 Mar 2022
Rare Tokens Degenerate All Tokens: Improving Neural Text Generation via Adaptive Gradient Gating for Rare Token Embeddings Sangwon Yu Jongyoon Song Heeseung Kim SeongEun Lee Woo-Jong Ryu Sung-Hoon Yoon 22 31 0 07 Sep 2021
Query-Key Normalization for Transformers Alex Henry Prudhvi Raj Dachapally S. Pawar Yuxuan Chen 17 77 0 08 Oct 2020
Improving Low Compute Language Modeling with In-Domain Embedding Initialisation Charles F Welch Rada Mihalcea Jonathan K. Kummerfeld AI4CE 19 4 0 29 Sep 2020