Norm of Mean Contextualized Embeddings Determines their Variance

17 September 2024

Papers citing "Norm of Mean Contextualized Embeddings Determines their Variance"

9 / 9 papers shown

Title
Norm of Word Embedding Encodes Information Gain Momose Oyama Sho Yokoi Hidetoshi Shimodaira 45 11 0 19 Dec 2022
Problems with Cosine as a Measure of Embedding Similarity for High Frequency Words Kaitlyn Zhou Kawin Ethayarajh Dallas Card Dan Jurafsky 60 66 0 10 May 2022
Learning to Remove: Towards Isotropic Pre-trained BERT Embedding Y. Liang Rui Cao Jie Zheng Jie Ren Ling Gao SSL 125 28 0 12 Apr 2021
Word Rotator's Distance Sho Yokoi Ryo Takahashi Reina Akama Jun Suzuki Kentaro Inui OT 31 58 0 30 Apr 2020
On Layer Normalization in the Transformer Architecture Ruibin Xiong Yunchang Yang Di He Kai Zheng Shuxin Zheng Chen Xing Huishuai Zhang Yanyan Lan Liwei Wang Tie-Yan Liu AI4CE 98 973 0 12 Feb 2020
RoBERTa: A Robustly Optimized BERT Pretraining Approach Yinhan Liu Myle Ott Naman Goyal Jingfei Du Mandar Joshi Danqi Chen Omer Levy M. Lewis Luke Zettlemoyer Veselin Stoyanov AIMat 435 24,160 0 26 Jul 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Jacob Devlin Ming-Wei Chang Kenton Lee Kristina Toutanova VLM SSL SSeg 1.2K 93,936 0 11 Oct 2018
Deep contextualized word representations Matthew E. Peters Mark Neumann Mohit Iyyer Matt Gardner Christopher Clark Kenton Lee Luke Zettlemoyer NAI 139 11,520 0 15 Feb 2018
Distributed Representations of Words and Phrases and their Compositionality Tomas Mikolov Ilya Sutskever Kai Chen G. Corrado J. Dean NAI OCL 315 33,445 0 16 Oct 2013