Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2210.11803
Cited By
Revisiting Checkpoint Averaging for Neural Machine Translation
21 October 2022
Yingbo Gao
Christian Herold
Zijian Yang
Hermann Ney
MoMe
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Revisiting Checkpoint Averaging for Neural Machine Translation"
24 / 24 papers shown
Title
MOS: Model Synergy for Test-Time Adaptation on LiDAR-Based 3D Object Detection
Zhuoxiao Chen
Junjie Meng
Mahsa Baktashmotlagh
Yonggang Zhang
Zi Huang
Yadan Luo
163
2
0
21 Jun 2024
Merging Models with Fisher-Weighted Averaging
Michael Matena
Colin Raffel
FedML
MoMe
87
402
0
18 Nov 2021
NVIDIA NeMo Neural Machine Translation Systems for English-German and English-Russian News and Biomedical Tasks at WMT21
Sandeep Subramanian
Oleksii Hrinchuk
Virginia Adams
Oleksii Kuchaiev
VLM
47
17
0
16 Nov 2021
Boost Neural Networks by Checkpoints
Feng Wang
Gu-Yeon Wei
Qiao Liu
Jinxiang Ou
Xian Wei
Hairong Lv
FedML
UQCV
45
10
0
03 Oct 2021
Scalable and Efficient MoE Training for Multitask Multilingual Models
Young Jin Kim
A. A. Awan
Alexandre Muzio
Andres Felipe Cruz Salinas
Liyang Lu
Amr Hendy
Samyam Rajbhandari
Yuxiong He
Hany Awadalla
MoE
137
84
0
22 Sep 2021
Facebook AI WMT21 News Translation Task Submission
C. Tran
Shruti Bhosale
James Cross
Philipp Koehn
Sergey Edunov
Angela Fan
VLM
182
82
0
06 Aug 2021
A Comparative Study on Neural Architectures and Training Methods for Japanese Speech Recognition
Shigeki Karita
Yotaro Kubo
M. Bacchiani
Llion Jones
39
13
0
09 Jun 2021
Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition
Yu Zhang
James Qin
Daniel S. Park
Wei Han
Chung-Cheng Chiu
Ruoming Pang
Quoc V. Le
Yonghui Wu
VLM
SSL
199
310
0
20 Oct 2020
Mask CTC: Non-Autoregressive End-to-End ASR with CTC and Mask Predict
Yosuke Higuchi
Shinji Watanabe
Nanxin Chen
Tetsuji Ogawa
Tetsunori Kobayashi
57
138
0
18 May 2020
Spike-Triggered Non-Autoregressive Transformer for End-to-End Speech Recognition
Zhengkun Tian
Jiangyan Yi
J. Tao
Ye Bai
Shuai Zhang
Zhengqi Wen
75
54
0
16 May 2020
PyTorch: An Imperative Style, High-Performance Deep Learning Library
Adam Paszke
Sam Gross
Francisco Massa
Adam Lerer
James Bradbury
...
Sasank Chilamkurthy
Benoit Steiner
Lu Fang
Junjie Bai
Soumith Chintala
ODL
520
42,559
0
03 Dec 2019
MUSE: Parallel Multi-Scale Attention for Sequence to Sequence Learning
Guangxiang Zhao
Xu Sun
Jingjing Xu
Zhiyuan Zhang
Liangchen Luo
LRM
48
49
0
17 Nov 2019
Transformer-based Acoustic Modeling for Hybrid Speech Recognition
Yongqiang Wang
Abdel-rahman Mohamed
Duc Le
Chunxi Liu
Alex Xiao
...
Xiaohui Zhang
Frank Zhang
Christian Fuegen
Geoffrey Zweig
M. Seltzer
52
249
0
22 Oct 2019
A Comparative Study on Transformer vs RNN in Speech Applications
Shigeki Karita
Nanxin Chen
Tomoki Hayashi
Takaaki Hori
Hirofumi Inaguma
...
Ryuichi Yamamoto
Xiao-fei Wang
Shinji Watanabe
Takenori Yoshimura
Wangyou Zhang
74
721
0
13 Sep 2019
Cued@wmt19:ewc&lms
Felix Stahlberg
Danielle Saunders
Adria de Gispert
Bill Byrne
43
13
0
11 Jun 2019
Training Tips for the Transformer Model
Martin Popel
Ondrej Bojar
70
311
0
01 Apr 2018
Averaging Weights Leads to Wider Optima and Better Generalization
Pavel Izmailov
Dmitrii Podoprikhin
T. Garipov
Dmitry Vetrov
A. Wilson
FedML
MoMe
135
1,669
0
14 Mar 2018
Self-Attention with Relative Position Representations
Peter Shaw
Jakob Uszkoreit
Ashish Vaswani
177
2,295
0
06 Mar 2018
Checkpoint Ensembles: Ensemble Methods from a Single Training Process
Hugh Chen
Scott M. Lundberg
Su-In Lee
UQCV
35
62
0
09 Oct 2017
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
722
132,199
0
12 Jun 2017
Snapshot Ensembles: Train 1, get M for free
Gao Huang
Yixuan Li
Geoff Pleiss
Zhuang Liu
John E. Hopcroft
Kilian Q. Weinberger
OOD
FedML
UQCV
134
951
0
01 Apr 2017
Is Neural Machine Translation Ready for Deployment? A Case Study on 30 Translation Directions
Marcin Junczys-Dowmunt
Tomasz Dwojak
Hieu T. Hoang
68
199
0
04 Oct 2016
Cyclical Learning Rates for Training Neural Networks
L. Smith
ODL
212
2,533
0
03 Jun 2015
Adam: A Method for Stochastic Optimization
Diederik P. Kingma
Jimmy Ba
ODL
1.9K
150,260
0
22 Dec 2014
1