ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1608.05859
  4. Cited By
Using the Output Embedding to Improve Language Models

Using the Output Embedding to Improve Language Models

20 August 2016
Ofir Press
Lior Wolf
ArXivPDFHTML

Papers citing "Using the Output Embedding to Improve Language Models"

50 / 146 papers shown
Title
Flash-VL 2B: Optimizing Vision-Language Model Performance for Ultra-Low Latency and High Throughput
Flash-VL 2B: Optimizing Vision-Language Model Performance for Ultra-Low Latency and High Throughput
Bo Zhang
Shuo Li
Runhe Tian
Yang Yang
Jixin Tang
Jinhao Zhou
Lin Ma
VLM
31
0
0
14 May 2025
Hierarchical Multi-Label Generation with Probabilistic Level-Constraint
Hierarchical Multi-Label Generation with Probabilistic Level-Constraint
Linqing Chen
Weilei Wang
Wentao Wu
Hanmeng Zhong
37
0
0
30 Apr 2025
Merging Feed-Forward Sublayers for Compressed Transformers
Merging Feed-Forward Sublayers for Compressed Transformers
Neha Verma
Kenton W. Murray
Kevin Duh
AI4CE
50
0
0
10 Jan 2025
Interchangeable Token Embeddings for Extendable Vocabulary and Alpha-Equivalence
Interchangeable Token Embeddings for Extendable Vocabulary and Alpha-Equivalence
İlker Işık
R. G. Cinbis
Ebru Aydin Gol
36
0
0
22 Oct 2024
Self-calibration for Language Model Quantization and Pruning
Self-calibration for Language Model Quantization and Pruning
Miles Williams
G. Chrysostomou
Nikolaos Aletras
MQ
165
0
0
22 Oct 2024
What makes math problems hard for reinforcement learning: a case study
What makes math problems hard for reinforcement learning: a case study
Ali Shehper
A. Medina-Mardones
Lucas Fagan
Angus Gruen
Piotr Kucharski
Sergei Gukov
Piotr Kucharski
Zhenghan Wang
Sergei Gukov
32
3
0
27 Aug 2024
Retrieval-augmented code completion for local projects using large
  language models
Retrieval-augmented code completion for local projects using large language models
Marko Hostnik
Marko Robnik-Sikonja
RALM
35
0
0
09 Aug 2024
Beyond Scaling Laws: Understanding Transformer Performance with
  Associative Memory
Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory
Xueyan Niu
Bo Bai
Lei Deng
Wei Han
39
6
0
14 May 2024
Understanding the effects of word-level linguistic annotations in
  under-resourced neural machine translation
Understanding the effects of word-level linguistic annotations in under-resourced neural machine translation
Víctor M. Sánchez-Cartagena
J. A. Pérez-Ortiz
F. Sánchez-Martínez
21
5
0
29 Jan 2024
EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language
  Models with 3D Parallelism
EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism
Yanxi Chen
Xuchen Pan
Yaliang Li
Bolin Ding
Jingren Zhou
LRM
41
31
0
08 Dec 2023
Object Recognition as Next Token Prediction
Object Recognition as Next Token Prediction
Kaiyu Yue
Borchun Chen
Jonas Geiping
Hengduo Li
Tom Goldstein
Ser-Nam Lim
40
9
0
04 Dec 2023
The mechanistic basis of data dependence and abrupt learning in an
  in-context classification task
The mechanistic basis of data dependence and abrupt learning in an in-context classification task
Gautam Reddy
27
52
0
03 Dec 2023
Compositional Capabilities of Autoregressive Transformers: A Study on
  Synthetic, Interpretable Tasks
Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks
Rahul Ramesh
Ekdeep Singh Lubana
Mikail Khona
Robert P. Dick
Hidenori Tanaka
CoGe
39
7
0
21 Nov 2023
Tied-Lora: Enhancing parameter efficiency of LoRA with weight tying
Tied-Lora: Enhancing parameter efficiency of LoRA with weight tying
Adithya Renduchintala
Tugrul Konuk
Oleksii Kuchaiev
MoMe
29
42
0
16 Nov 2023
Copilot4D: Learning Unsupervised World Models for Autonomous Driving via
  Discrete Diffusion
Copilot4D: Learning Unsupervised World Models for Autonomous Driving via Discrete Diffusion
Lunjun Zhang
Yuwen Xiong
Ze Yang
Sergio Casas
Rui Hu
R. Urtasun
47
51
0
02 Nov 2023
Large-Scale and Multi-Perspective Opinion Summarization with Diverse
  Review Subsets
Large-Scale and Multi-Perspective Opinion Summarization with Diverse Review Subsets
Han Jiang
Rui Wang
Zhihua Wei
Yu Li
Xinpeng Wang
37
4
0
20 Oct 2023
Toward Joint Language Modeling for Speech Units and Text
Toward Joint Language Modeling for Speech Units and Text
Ju-Chieh Chou
Chung-Ming Chien
Wei-Ning Hsu
Karen Livescu
Arun Babu
Alexis Conneau
Alexei Baevski
Michael Auli
VLM
28
20
0
12 Oct 2023
Small-scale proxies for large-scale Transformer training instabilities
Small-scale proxies for large-scale Transformer training instabilities
Mitchell Wortsman
Peter J. Liu
Lechao Xiao
Katie Everett
A. Alemi
...
Jascha Narain Sohl-Dickstein
Kelvin Xu
Jaehoon Lee
Justin Gilmer
Simon Kornblith
40
84
0
25 Sep 2023
Using fine-tuning and min lookahead beam search to improve Whisper
Using fine-tuning and min lookahead beam search to improve Whisper
Andrea Do
Oscar Brown
Zhengjie Wang
Nikhil Mathew
Zixin Liu
Jawwad Ahmed
Cheng Yu
35
1
0
19 Sep 2023
Long-range Language Modeling with Self-retrieval
Long-range Language Modeling with Self-retrieval
Ohad Rubin
Jonathan Berant
RALM
KELM
19
18
0
23 Jun 2023
Backpack Language Models
Backpack Language Models
John Hewitt
John Thickstun
Christopher D. Manning
Percy Liang
KELM
16
16
0
26 May 2023
Exploring Representational Disparities Between Multilingual and
  Bilingual Translation Models
Exploring Representational Disparities Between Multilingual and Bilingual Translation Models
Neha Verma
Kenton W. Murray
Kevin Duh
19
0
0
23 May 2023
Beyond Shared Vocabulary: Increasing Representational Word Similarities
  across Languages for Multilingual Machine Translation
Beyond Shared Vocabulary: Increasing Representational Word Similarities across Languages for Multilingual Machine Translation
Di Wu
Christof Monz
42
9
0
23 May 2023
When Does Monolingual Data Help Multilingual Translation: The Role of
  Domain and Model Scale
When Does Monolingual Data Help Multilingual Translation: The Role of Domain and Model Scale
Christos Baziotis
Biao Zhang
Alexandra Birch
Barry Haddow
30
2
0
23 May 2023
Learning Language-Specific Layers for Multilingual Machine Translation
Learning Language-Specific Layers for Multilingual Machine Translation
Telmo Pires
Robin M. Schmidt
Yi-Hsiu Liao
Stephan Peitz
42
17
0
04 May 2023
Effective Theory of Transformers at Initialization
Effective Theory of Transformers at Initialization
Emily Dinan
Sho Yaida
Susan Zhang
30
14
0
04 Apr 2023
Machine Learning for Brain Disorders: Transformers and Visual
  Transformers
Machine Learning for Brain Disorders: Transformers and Visual Transformers
Robin Courant
Maika Edberg
Nicolas Dufour
Vicky Kalogeiton
MedIm
ViT
32
1
0
21 Mar 2023
Transformadores: Fundamentos teoricos y Aplicaciones
Transformadores: Fundamentos teoricos y Aplicaciones
J. D. L. Torre
78
0
0
18 Feb 2023
Byte Pair Encoding for Symbolic Music
Byte Pair Encoding for Symbolic Music
Nathan Fradet
Nicolas Gutowski
F. Chhel
Jean-Pierre Briot
29
15
0
27 Jan 2023
Breaking the Representation Bottleneck of Chinese Characters: Neural
  Machine Translation with Stroke Sequence Modeling
Breaking the Representation Bottleneck of Chinese Characters: Neural Machine Translation with Stroke Sequence Modeling
Zhijun Wang
Xuebo Liu
Min Zhang
27
11
0
23 Nov 2022
Circling Back to Recurrent Models of Language
Circling Back to Recurrent Models of Language
Gábor Melis
40
0
0
03 Nov 2022
Two Models are Better than One: Federated Learning Is Not Private For
  Google GBoard Next Word Prediction
Two Models are Better than One: Federated Learning Is Not Private For Google GBoard Next Word Prediction
Mohamed Suliman
D. Leith
SILM
FedML
26
7
0
30 Oct 2022
Bilingual Synchronization: Restoring Translational Relationships with
  Editing Operations
Bilingual Synchronization: Restoring Translational Relationships with Editing Operations
Jitao Xu
Josep Crego
François Yvon
35
4
0
24 Oct 2022
Collaborative Image Understanding
Collaborative Image Understanding
Koby Bibas
Oren Sar Shalom
Dietmar Jannach
VLM
24
2
0
21 Oct 2022
Analyzing Transformers in Embedding Space
Analyzing Transformers in Embedding Space
Guy Dar
Mor Geva
Ankit Gupta
Jonathan Berant
24
83
0
06 Sep 2022
Training Large-Vocabulary Neural Language Models by Private Federated
  Learning for Resource-Constrained Devices
Training Large-Vocabulary Neural Language Models by Private Federated Learning for Resource-Constrained Devices
Mingbin Xu
Congzheng Song
Ye Tian
Neha Agrawal
Filip Granqvist
...
Shiyi Han
Yaqiao Deng
Leo Liu
Anmol Walia
Alex Jin
FedML
15
22
0
18 Jul 2022
Understanding and Mitigating the Uncertainty in Zero-Shot Translation
Understanding and Mitigating the Uncertainty in Zero-Shot Translation
Wenxuan Wang
Wenxiang Jiao
Shuo Wang
Zhaopeng Tu
Michael R. Lyu
UQLM
35
9
0
20 May 2022
Twist Decoding: Diverse Generators Guide Each Other
Twist Decoding: Diverse Generators Guide Each Other
Jungo Kasai
Keisuke Sakaguchi
Ronan Le Bras
Hao Peng
Ximing Lu
Dragomir R. Radev
Yejin Choi
Noah A. Smith
SyDa
27
4
0
19 May 2022
Joint Generation of Captions and Subtitles with Dual Decoding
Joint Generation of Captions and Subtitles with Dual Decoding
Jitao Xu
François Buet
Josep Crego
Elise Bertin-Lemée
François Yvon
24
8
0
13 May 2022
CoCoA-MT: A Dataset and Benchmark for Contrastive Controlled MT with
  Application to Formality
CoCoA-MT: A Dataset and Benchmark for Contrastive Controlled MT with Application to Formality
Maria Nadejde
Anna Currey
B. Hsu
Xing Niu
Marcello Federico
Georgiana Dinu
22
24
0
09 May 2022
Balancing Multi-Domain Corpora Learning for Open-Domain Response
  Generation
Balancing Multi-Domain Corpora Learning for Open-Domain Response Generation
Yujie Xing
Jason (Jinglun) Cai
Nils Barlaug
Peng Liu
J. Gulla
29
4
0
05 May 2022
Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo
  Languages
Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo Languages
Felix Wu
Kwangyoun Kim
Shinji Watanabe
Kyu Jeong Han
Ryan T. McDonald
Kilian Q. Weinberger
Yoav Artzi
SyDa
48
37
0
02 May 2022
Linearizing Transformer with Key-Value Memory
Linearizing Transformer with Key-Value Memory
Yizhe Zhang
Deng Cai
20
5
0
23 Mar 2022
Pretraining with Artificial Language: Studying Transferable Knowledge in
  Language Models
Pretraining with Artificial Language: Studying Transferable Knowledge in Language Models
Ryokan Ri
Yoshimasa Tsuruoka
32
25
0
19 Mar 2022
Model soups: averaging weights of multiple fine-tuned models improves
  accuracy without increasing inference time
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time
Mitchell Wortsman
Gabriel Ilharco
S. Gadre
Rebecca Roelofs
Raphael Gontijo-Lopes
...
Hongseok Namkoong
Ali Farhadi
Y. Carmon
Simon Kornblith
Ludwig Schmidt
MoMe
54
916
1
10 Mar 2022
One-Shot Learning from a Demonstration with Hierarchical Latent Language
One-Shot Learning from a Demonstration with Hierarchical Latent Language
Nathaniel Weir
Xingdi Yuan
Marc-Alexandre Côté
Matthew J. Hausknecht
Romain Laroche
Ida Momennejad
H. V. Seijen
Benjamin Van Durme
BDL
24
6
0
09 Mar 2022
ACORT: A Compact Object Relation Transformer for Parameter Efficient
  Image Captioning
ACORT: A Compact Object Relation Transformer for Parameter Efficient Image Captioning
J. Tan
Y. Tan
C. Chan
Joon Huang Chuah
VLM
ViT
26
15
0
11 Feb 2022
ShuttleNet: Position-aware Fusion of Rally Progress and Player Styles
  for Stroke Forecasting in Badminton
ShuttleNet: Position-aware Fusion of Rally Progress and Player Styles for Stroke Forecasting in Badminton
Wei-Yao Wang
Hong-Han Shuai
Kai-Shiang Chang
Wen-Chih Peng
31
43
0
02 Dec 2021
How much do language models copy from their training data? Evaluating
  linguistic novelty in text generation using RAVEN
How much do language models copy from their training data? Evaluating linguistic novelty in text generation using RAVEN
R. Thomas McCoy
P. Smolensky
Tal Linzen
Jianfeng Gao
Asli Celikyilmaz
SyDa
25
119
0
18 Nov 2021
Faithful Target Attribute Prediction in Neural Machine Translation
Faithful Target Attribute Prediction in Neural Machine Translation
Xing Niu
Georgiana Dinu
Prashant Mathur
Anna Currey
18
4
0
24 Sep 2021
123
Next