Using the Output Embedding to Improve Language Models

20 August 2016

Ofir Press

Lior Wolf

Papers citing "Using the Output Embedding to Improve Language Models"

50 / 146 papers shown

Title
Flash-VL 2B: Optimizing Vision-Language Model Performance for Ultra-Low Latency and High Throughput Bo Zhang Shuo Li Runhe Tian Yang Yang Jixin Tang Jinhao Zhou Lin Ma VLM 31 0 0 14 May 2025
Hierarchical Multi-Label Generation with Probabilistic Level-Constraint Linqing Chen Weilei Wang Wentao Wu Hanmeng Zhong 37 0 0 30 Apr 2025
Merging Feed-Forward Sublayers for Compressed Transformers Neha Verma Kenton W. Murray Kevin Duh AI4CE 50 0 0 10 Jan 2025
Interchangeable Token Embeddings for Extendable Vocabulary and Alpha-Equivalence İlker Işık R. G. Cinbis Ebru Aydin Gol 36 0 0 22 Oct 2024
Self-calibration for Language Model Quantization and Pruning Miles Williams G. Chrysostomou Nikolaos Aletras MQ 165 0 0 22 Oct 2024
What makes math problems hard for reinforcement learning: a case study Ali Shehper A. Medina-Mardones Lucas Fagan Angus Gruen Piotr Kucharski Sergei Gukov Piotr Kucharski Zhenghan Wang Sergei Gukov 32 3 0 27 Aug 2024
Retrieval-augmented code completion for local projects using large language models Marko Hostnik Marko Robnik-Sikonja RALM 35 0 0 09 Aug 2024
Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory Xueyan Niu Bo Bai Lei Deng Wei Han 39 6 0 14 May 2024
Understanding the effects of word-level linguistic annotations in under-resourced neural machine translation Víctor M. Sánchez-Cartagena J. A. Pérez-Ortiz F. Sánchez-Martínez 21 5 0 29 Jan 2024
EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism Yanxi Chen Xuchen Pan Yaliang Li Bolin Ding Jingren Zhou LRM 41 31 0 08 Dec 2023
Object Recognition as Next Token Prediction Kaiyu Yue Borchun Chen Jonas Geiping Hengduo Li Tom Goldstein Ser-Nam Lim 40 9 0 04 Dec 2023
The mechanistic basis of data dependence and abrupt learning in an in-context classification task Gautam Reddy 27 52 0 03 Dec 2023
Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks Rahul Ramesh Ekdeep Singh Lubana Mikail Khona Robert P. Dick Hidenori Tanaka CoGe 39 7 0 21 Nov 2023
Tied-Lora: Enhancing parameter efficiency of LoRA with weight tying Adithya Renduchintala Tugrul Konuk Oleksii Kuchaiev MoMe 29 42 0 16 Nov 2023
Copilot4D: Learning Unsupervised World Models for Autonomous Driving via Discrete Diffusion Lunjun Zhang Yuwen Xiong Ze Yang Sergio Casas Rui Hu R. Urtasun 47 51 0 02 Nov 2023
Large-Scale and Multi-Perspective Opinion Summarization with Diverse Review Subsets Han Jiang Rui Wang Zhihua Wei Yu Li Xinpeng Wang 37 4 0 20 Oct 2023
Toward Joint Language Modeling for Speech Units and Text Ju-Chieh Chou Chung-Ming Chien Wei-Ning Hsu Karen Livescu Arun Babu Alexis Conneau Alexei Baevski Michael Auli VLM 28 20 0 12 Oct 2023
Small-scale proxies for large-scale Transformer training instabilities Mitchell Wortsman Peter J. Liu Lechao Xiao Katie Everett A. Alemi ... Jascha Narain Sohl-Dickstein Kelvin Xu Jaehoon Lee Justin Gilmer Simon Kornblith 40 84 0 25 Sep 2023
Using fine-tuning and min lookahead beam search to improve Whisper Andrea Do Oscar Brown Zhengjie Wang Nikhil Mathew Zixin Liu Jawwad Ahmed Cheng Yu 35 1 0 19 Sep 2023
Long-range Language Modeling with Self-retrieval Ohad Rubin Jonathan Berant RALM KELM 19 18 0 23 Jun 2023
Backpack Language Models John Hewitt John Thickstun Christopher D. Manning Percy Liang KELM 16 16 0 26 May 2023
Exploring Representational Disparities Between Multilingual and Bilingual Translation Models Neha Verma Kenton W. Murray Kevin Duh 19 0 0 23 May 2023
Beyond Shared Vocabulary: Increasing Representational Word Similarities across Languages for Multilingual Machine Translation Di Wu Christof Monz 42 9 0 23 May 2023
When Does Monolingual Data Help Multilingual Translation: The Role of Domain and Model Scale Christos Baziotis Biao Zhang Alexandra Birch Barry Haddow 30 2 0 23 May 2023
Learning Language-Specific Layers for Multilingual Machine Translation Telmo Pires Robin M. Schmidt Yi-Hsiu Liao Stephan Peitz 42 17 0 04 May 2023
Effective Theory of Transformers at Initialization Emily Dinan Sho Yaida Susan Zhang 30 14 0 04 Apr 2023
Machine Learning for Brain Disorders: Transformers and Visual Transformers Robin Courant Maika Edberg Nicolas Dufour Vicky Kalogeiton MedIm ViT 32 1 0 21 Mar 2023
Transformadores: Fundamentos teoricos y Aplicaciones J. D. L. Torre 78 0 0 18 Feb 2023
Byte Pair Encoding for Symbolic Music Nathan Fradet Nicolas Gutowski F. Chhel Jean-Pierre Briot 29 15 0 27 Jan 2023
Breaking the Representation Bottleneck of Chinese Characters: Neural Machine Translation with Stroke Sequence Modeling Zhijun Wang Xuebo Liu Min Zhang 27 11 0 23 Nov 2022
Circling Back to Recurrent Models of Language Gábor Melis 40 0 0 03 Nov 2022
Two Models are Better than One: Federated Learning Is Not Private For Google GBoard Next Word Prediction Mohamed Suliman D. Leith SILM FedML 26 7 0 30 Oct 2022
Bilingual Synchronization: Restoring Translational Relationships with Editing Operations Jitao Xu Josep Crego François Yvon 35 4 0 24 Oct 2022
Collaborative Image Understanding Koby Bibas Oren Sar Shalom Dietmar Jannach VLM 24 2 0 21 Oct 2022
Analyzing Transformers in Embedding Space Guy Dar Mor Geva Ankit Gupta Jonathan Berant 24 83 0 06 Sep 2022
Training Large-Vocabulary Neural Language Models by Private Federated Learning for Resource-Constrained Devices Mingbin Xu Congzheng Song Ye Tian Neha Agrawal Filip Granqvist ... Shiyi Han Yaqiao Deng Leo Liu Anmol Walia Alex Jin FedML 15 22 0 18 Jul 2022
Understanding and Mitigating the Uncertainty in Zero-Shot Translation Wenxuan Wang Wenxiang Jiao Shuo Wang Zhaopeng Tu Michael R. Lyu UQLM 35 9 0 20 May 2022
Twist Decoding: Diverse Generators Guide Each Other Jungo Kasai Keisuke Sakaguchi Ronan Le Bras Hao Peng Ximing Lu Dragomir R. Radev Yejin Choi Noah A. Smith SyDa 27 4 0 19 May 2022
Joint Generation of Captions and Subtitles with Dual Decoding Jitao Xu François Buet Josep Crego Elise Bertin-Lemée François Yvon 24 8 0 13 May 2022
CoCoA-MT: A Dataset and Benchmark for Contrastive Controlled MT with Application to Formality Maria Nadejde Anna Currey B. Hsu Xing Niu Marcello Federico Georgiana Dinu 22 24 0 09 May 2022
Balancing Multi-Domain Corpora Learning for Open-Domain Response Generation Yujie Xing Jason (Jinglun) Cai Nils Barlaug Peng Liu J. Gulla 29 4 0 05 May 2022
Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo Languages Felix Wu Kwangyoun Kim Shinji Watanabe Kyu Jeong Han Ryan T. McDonald Kilian Q. Weinberger Yoav Artzi SyDa 48 37 0 02 May 2022
Linearizing Transformer with Key-Value Memory Yizhe Zhang Deng Cai 20 5 0 23 Mar 2022
Pretraining with Artificial Language: Studying Transferable Knowledge in Language Models Ryokan Ri Yoshimasa Tsuruoka 32 25 0 19 Mar 2022
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time Mitchell Wortsman Gabriel Ilharco S. Gadre Rebecca Roelofs Raphael Gontijo-Lopes ... Hongseok Namkoong Ali Farhadi Y. Carmon Simon Kornblith Ludwig Schmidt MoMe 54 916 1 10 Mar 2022
One-Shot Learning from a Demonstration with Hierarchical Latent Language Nathaniel Weir Xingdi Yuan Marc-Alexandre Côté Matthew J. Hausknecht Romain Laroche Ida Momennejad H. V. Seijen Benjamin Van Durme BDL 24 6 0 09 Mar 2022
ACORT: A Compact Object Relation Transformer for Parameter Efficient Image Captioning J. Tan Y. Tan C. Chan Joon Huang Chuah VLM ViT 26 15 0 11 Feb 2022
ShuttleNet: Position-aware Fusion of Rally Progress and Player Styles for Stroke Forecasting in Badminton Wei-Yao Wang Hong-Han Shuai Kai-Shiang Chang Wen-Chih Peng 31 43 0 02 Dec 2021
How much do language models copy from their training data? Evaluating linguistic novelty in text generation using RAVEN R. Thomas McCoy P. Smolensky Tal Linzen Jianfeng Gao Asli Celikyilmaz SyDa 25 119 0 18 Nov 2021
Faithful Target Attribute Prediction in Neural Machine Translation Xing Niu Georgiana Dinu Prashant Mathur Anna Currey 18 4 0 24 Sep 2021