ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1808.06226
  4. Cited By
SentencePiece: A simple and language independent subword tokenizer and
  detokenizer for Neural Text Processing

SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing

19 August 2018
Taku Kudo
John Richardson
ArXiv (abs)PDFHTMLGithub (10925★)

Papers citing "SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing"

50 / 1,950 papers shown
Title
Non-Autoregressive Machine Translation with Latent Alignments
Non-Autoregressive Machine Translation with Latent Alignments
Chitwan Saharia
William Chan
Saurabh Saxena
Mohammad Norouzi
90
159
0
16 Apr 2020
Analyzing analytical methods: The case of phonology in neural models of
  spoken language
Analyzing analytical methods: The case of phonology in neural models of spoken language
Grzegorz Chrupała
Bertrand Higy
Afra Alishahi
58
20
0
15 Apr 2020
Balancing Training for Multilingual Neural Machine Translation
Balancing Training for Multilingual Neural Machine Translation
Xinyi Wang
Yulia Tsvetkov
Graham Neubig
122
101
0
14 Apr 2020
CLUE: A Chinese Language Understanding Evaluation Benchmark
CLUE: A Chinese Language Understanding Evaluation Benchmark
Liang Xu
Hai Hu
Xuanwei Zhang
Lu Li
Chenjie Cao
...
Cong Yue
Xinrui Zhang
Zhen-Yi Yang
Kyle Richardson
Zhenzhong Lan
ELM
110
388
0
13 Apr 2020
On the Language Neutrality of Pre-trained Multilingual Representations
On the Language Neutrality of Pre-trained Multilingual Representations
Jindrich Libovický
Rudolf Rosa
Alexander Fraser
81
107
0
09 Apr 2020
Translation Artifacts in Cross-lingual Transfer Learning
Translation Artifacts in Cross-lingual Transfer Learning
Mikel Artetxe
Gorka Labaka
Eneko Agirre
65
121
0
09 Apr 2020
Transfer learning and subword sampling for asymmetric-resource
  one-to-many neural translation
Transfer learning and subword sampling for asymmetric-resource one-to-many neural translation
Stig-Arne Gronroos
Sami Virpioja
M. Kurimo
63
6
0
08 Apr 2020
Learning Discrete Structured Representations by Adversarially Maximizing
  Mutual Information
Learning Discrete Structured Representations by Adversarially Maximizing Mutual Information
K. Stratos
Sam Wiseman
SSLDRL
21
7
0
08 Apr 2020
Byte Pair Encoding is Suboptimal for Language Model Pretraining
Byte Pair Encoding is Suboptimal for Language Model Pretraining
Kaj Bostrom
Greg Durrett
69
214
0
07 Apr 2020
KorNLI and KorSTS: New Benchmark Datasets for Korean Natural Language
  Understanding
KorNLI and KorSTS: New Benchmark Datasets for Korean Natural Language Understanding
Jiyeon Ham
Yo Joong Choe
Kyubyong Park
Ilji Choi
Hyungjoon Soh
67
78
0
07 Apr 2020
Improving Fluency of Non-Autoregressive Machine Translation
Improving Fluency of Non-Autoregressive Machine Translation
Zdeněk Kasner
Jindrich Libovický
Jindvrich Helcl
AI4CE
50
8
0
07 Apr 2020
MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices
MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices
Zhiqing Sun
Hongkun Yu
Xiaodan Song
Renjie Liu
Yiming Yang
Denny Zhou
MQ
132
820
0
06 Apr 2020
Meta-Learning for Few-Shot NMT Adaptation
Meta-Learning for Few-Shot NMT Adaptation
Amr Sharaf
Hany Hassan
Hal Daumé
61
35
0
06 Apr 2020
Machine Translation Pre-training for Data-to-Text Generation -- A Case
  Study in Czech
Machine Translation Pre-training for Data-to-Text Generation -- A Case Study in Czech
Mihir Kale
Scott Roy
51
14
0
05 Apr 2020
Testing pre-trained Transformer models for Lithuanian news clustering
Testing pre-trained Transformer models for Lithuanian news clustering
Lukas Stankevicius
M. Lukoševičius
VLM
27
8
0
03 Apr 2020
The RWTH ASR System for TED-LIUM Release 2: Improving Hybrid HMM with
  SpecAugment
The RWTH ASR System for TED-LIUM Release 2: Improving Hybrid HMM with SpecAugment
Wei Zhou
Wilfried Michel
Kazuki Irie
M. Kitza
Ralf Schluter
Hermann Ney
42
43
0
02 Apr 2020
How Furiously Can Colourless Green Ideas Sleep? Sentence Acceptability
  in Context
How Furiously Can Colourless Green Ideas Sleep? Sentence Acceptability in Context
Jey Han Lau
C. S. Armendariz
Shalom Lappin
Matthew Purver
Chang Shu
48
41
0
02 Apr 2020
Low Resource Neural Machine Translation: A Benchmark for Five African
  Languages
Low Resource Neural Machine Translation: A Benchmark for Five African Languages
Surafel Melaku Lakew
Matteo Negri
Marco Turchi
AIMat
61
27
0
31 Mar 2020
High Performance Sequence-to-Sequence Model for Streaming Speech
  Recognition
High Performance Sequence-to-Sequence Model for Streaming Speech Recognition
T. Nguyen
Ngoc-Quan Pham
S. Stueker
A. Waibel
40
7
0
22 Mar 2020
TNT-KID: Transformer-based Neural Tagger for Keyword Identification
TNT-KID: Transformer-based Neural Tagger for Keyword Identification
Matej Martinc
Blaž Škrlj
Senja Pollak
87
38
0
20 Mar 2020
Enhancing Factual Consistency of Abstractive Summarization
Enhancing Factual Consistency of Abstractive Summarization
Chenguang Zhu
William Fu-Hinthorn
Ruochen Xu
Qingkai Zeng
Michael Zeng
Xuedong Huang
Meng Jiang
HILMKELM
270
40
0
19 Mar 2020
Document Ranking with a Pretrained Sequence-to-Sequence Model
Document Ranking with a Pretrained Sequence-to-Sequence Model
Rodrigo Nogueira
Zhiying Jiang
Jimmy J. Lin
102
587
0
14 Mar 2020
Video Caption Dataset for Describing Human Actions in Japanese
Video Caption Dataset for Describing Human Actions in Japanese
Yutaro Shigeto
Yuya Yoshikawa
Jiaqing Lin
A. Takeuchi
29
3
0
10 Mar 2020
Morfessor EM+Prune: Improved Subword Segmentation with Expectation
  Maximization and Pruning
Morfessor EM+Prune: Improved Subword Segmentation with Expectation Maximization and Pruning
Stig-Arne Gronroos
Sami Virpioja
M. Kurimo
VLM
84
21
0
06 Mar 2020
What the [MASK]? Making Sense of Language-Specific BERT Models
What the [MASK]? Making Sense of Language-Specific BERT Models
Debora Nozza
Federico Bianchi
Dirk Hovy
162
108
0
05 Mar 2020
PhoBERT: Pre-trained language models for Vietnamese
PhoBERT: Pre-trained language models for Vietnamese
Dat Quoc Nguyen
A. Nguyen
270
357
0
02 Mar 2020
SkinAugment: Auto-Encoding Speaker Conversions for Automatic Speech
  Translation
SkinAugment: Auto-Encoding Speaker Conversions for Automatic Speech Translation
Arya D. McCarthy
Liezl Puzon
J. Pino
77
24
0
27 Feb 2020
Language-Independent Tokenisation Rivals Language-Specific Tokenisation
  for Word Similarity Prediction
Language-Independent Tokenisation Rivals Language-Specific Tokenisation for Word Similarity Prediction
Danushka Bollegala
Ryuichi Kiryo
K. Tsujino
Haruki Yukawa
21
7
0
25 Feb 2020
Semi-Supervised Speech Recognition via Local Prior Matching
Semi-Supervised Speech Recognition via Local Prior Matching
Wei-Ning Hsu
Ann Lee
Gabriel Synnaeve
Awni Y. Hannun
SSL
133
31
0
24 Feb 2020
Modelling Latent Skills for Multitask Language Generation
Modelling Latent Skills for Multitask Language Generation
Kris Cao
Dani Yogatama
31
3
0
21 Feb 2020
Imputer: Sequence Modelling via Imputation and Dynamic Programming
Imputer: Sequence Modelling via Imputation and Dynamic Programming
William Chan
Chitwan Saharia
Geoffrey E. Hinton
Mohammad Norouzi
Navdeep Jaitly
BDLAI4TS
95
116
0
20 Feb 2020
Estimating Training Data Influence by Tracing Gradient Descent
Estimating Training Data Influence by Tracing Gradient Descent
G. Pruthi
Frederick Liu
Mukund Sundararajan
Satyen Kale
TDI
142
418
0
19 Feb 2020
Controlling Computation versus Quality for Neural Sequence Models
Controlling Computation versus Quality for Neural Sequence Models
Ankur Bapna
N. Arivazhagan
Orhan Firat
85
30
0
17 Feb 2020
Speech Corpus of Ainu Folklore and End-to-end Speech Recognition for
  Ainu Language
Speech Corpus of Ainu Folklore and End-to-end Speech Recognition for Ainu Language
Kohei Matsuura
Sei Ueno
Masato Mimura
S. Sakai
Tatsuya Kawahara
CVBM
34
13
0
16 Feb 2020
FQuAD: French Question Answering Dataset
FQuAD: French Question Answering Dataset
Martin d'Hoffschmidt
Wacim Belblidia
Tom Brendlé
Quentin Heinrich
Maxime Vidal
118
100
0
14 Feb 2020
fastai: A Layered API for Deep Learning
fastai: A Layered API for Deep Learning
Jeremy Howard
Sylvain Gugger
AI4CE
135
872
0
11 Feb 2020
Learning Coupled Policies for Simultaneous Machine Translation using
  Imitation Learning
Learning Coupled Policies for Simultaneous Machine Translation using Imitation Learning
Philip Arthur
Trevor Cohn
Gholamreza Haffari
99
18
0
11 Feb 2020
Accelerating RNN Transducer Inference via One-Step Constrained Beam
  Search
Accelerating RNN Transducer Inference via One-Step Constrained Beam Search
Juntae Kim
Yoonhan Lee
67
24
0
10 Feb 2020
A Multilingual View of Unsupervised Machine Translation
A Multilingual View of Unsupervised Machine Translation
Xavier Garcia
Pierre Foret
Thibault Sellam
Ankur P. Parikh
122
37
0
07 Feb 2020
A deep-learning view of chemical space designed to facilitate drug
  discovery
A deep-learning view of chemical space designed to facilitate drug discovery
P. Maragakis
Hunter M. Nisonoff
B. Cole
D. Shaw
100
30
0
07 Feb 2020
Graph Constrained Reinforcement Learning for Natural Language Action
  Spaces
Graph Constrained Reinforcement Learning for Natural Language Action Spaces
Prithviraj Ammanabrolu
Matthew J. Hausknecht
AI4CELLMAG
86
129
0
23 Jan 2020
Pre-training via Leveraging Assisting Languages and Data Selection for
  Neural Machine Translation
Pre-training via Leveraging Assisting Languages and Data Selection for Neural Machine Translation
Haiyue Song
Raj Dabre
Zhuoyuan Mao
Fei Cheng
Sadao Kurohashi
Eiichiro Sumita
43
2
0
23 Jan 2020
Multilingual Denoising Pre-training for Neural Machine Translation
Multilingual Denoising Pre-training for Neural Machine Translation
Yinhan Liu
Jiatao Gu
Naman Goyal
Xian Li
Sergey Edunov
Marjan Ghazvininejad
M. Lewis
Luke Zettlemoyer
AI4CEAIMat
128
1,817
0
22 Jan 2020
Normalization of Input-output Shared Embeddings in Text Generation
  Models
Normalization of Input-output Shared Embeddings in Text Generation Models
Jinyang Liu
Yujia Zhai
Zizhong Chen
37
0
0
22 Jan 2020
Unsupervised Sentiment Analysis for Code-mixed Data
Unsupervised Sentiment Analysis for Code-mixed Data
Siddharth Yadav
Tanmoy Chakraborty
47
15
0
20 Jan 2020
Streaming automatic speech recognition with the transformer model
Streaming automatic speech recognition with the transformer model
Niko Moritz
Takaaki Hori
Jonathan Le Roux
142
187
0
08 Jan 2020
Language Models Are An Effective Patient Representation Learning
  Technique For Electronic Health Record Data
Language Models Are An Effective Patient Representation Learning Technique For Electronic Health Record Data
E. Steinberg
Kenneth Jung
Jason Alan Fries
Conor K. Corbin
Stephen Pfohl
N. Shah
94
111
0
06 Jan 2020
Exploring Benefits of Transfer Learning in Neural Machine Translation
Exploring Benefits of Transfer Learning in Neural Machine Translation
Tom Kocmi
57
17
0
06 Jan 2020
A Comprehensive Survey of Multilingual Neural Machine Translation
A Comprehensive Survey of Multilingual Neural Machine Translation
Raj Dabre
Chenhui Chu
Anoop Kunchukuttan
LRM
116
33
0
04 Jan 2020
TED: A Pretrained Unsupervised Summarization Model with Theme Modeling
  and Denoising
TED: A Pretrained Unsupervised Summarization Model with Theme Modeling and Denoising
Ziyi Yang
Chenguang Zhu
R. Gmyr
Michael Zeng
Xuedong Huang
Eric Darve
99
61
0
03 Jan 2020
Previous
123...36373839
Next