ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1808.06226
  4. Cited By
SentencePiece: A simple and language independent subword tokenizer and
  detokenizer for Neural Text Processing

SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing

19 August 2018
Taku Kudo
John Richardson
ArXivPDFHTML

Papers citing "SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing"

50 / 1,923 papers shown
Title
Language Model Prior for Low-Resource Neural Machine Translation
Language Model Prior for Low-Resource Neural Machine Translation
Christos Baziotis
Barry Haddow
Alexandra Birch
18
53
0
30 Apr 2020
Bridging Linguistic Typology and Multilingual Machine Translation with
  Multi-View Language Representations
Bridging Linguistic Typology and Multilingual Machine Translation with Multi-View Language Representations
Arturo Oncevay
Barry Haddow
Alexandra Birch
19
34
0
30 Apr 2020
Recipes for Adapting Pre-trained Monolingual and Multilingual Models to
  Machine Translation
Recipes for Adapting Pre-trained Monolingual and Multilingual Models to Machine Translation
Asa Cooper Stickland
Xian Li
Marjan Ghazvininejad
36
44
0
30 Apr 2020
Mind Your Inflections! Improving NLP for Non-Standard Englishes with
  Base-Inflection Encoding
Mind Your Inflections! Improving NLP for Non-Standard Englishes with Base-Inflection Encoding
Samson Tan
Chenyu You
Lav Varshney
Min-Yen Kan
17
34
0
30 Apr 2020
Enriched Pre-trained Transformers for Joint Slot Filling and Intent
  Detection
Enriched Pre-trained Transformers for Joint Slot Filling and Intent Detection
Momchil Hardalov
Ivan Koychev
Preslav Nakov
VLM
28
17
0
30 Apr 2020
Vocabulary Adaptation for Distant Domain Adaptation in Neural Machine
  Translation
Vocabulary Adaptation for Distant Domain Adaptation in Neural Machine Translation
Shoetsu Sato
Jin Sakuma
Naoki Yoshinaga
Masashi Toyoda
M. Kitsuregawa
30
3
0
30 Apr 2020
Self-Supervised and Controlled Multi-Document Opinion Summarization
Self-Supervised and Controlled Multi-Document Opinion Summarization
Hady ElSahar
Maximin Coavoux
Matthias Gallé
Jos Rozen
29
48
0
30 Apr 2020
Automatic Machine Translation Evaluation in Many Languages via Zero-Shot
  Paraphrasing
Automatic Machine Translation Evaluation in Many Languages via Zero-Shot Paraphrasing
Brian Thompson
Matt Post
LRM
19
188
0
30 Apr 2020
WT5?! Training Text-to-Text Models to Explain their Predictions
WT5?! Training Text-to-Text Models to Explain their Predictions
Sharan Narang
Colin Raffel
Katherine Lee
Adam Roberts
Noah Fiedel
Karishma Malkan
25
197
0
30 Apr 2020
Simulated Multiple Reference Training Improves Low-Resource Machine
  Translation
Simulated Multiple Reference Training Improves Low-Resource Machine Translation
Huda Khayrallah
Brian Thompson
Matt Post
Philipp Koehn
20
38
0
30 Apr 2020
An Empirical Study of Pre-trained Transformers for Arabic Information
  Extraction
An Empirical Study of Pre-trained Transformers for Arabic Information Extraction
Wuwei Lan
Yang Chen
Wei Xu
Alan Ritter
22
4
0
30 Apr 2020
Bilingual Text Extraction as Reading Comprehension
Bilingual Text Extraction as Reading Comprehension
Katsuki Chousa
Masaaki Nagata
Masaaki Nishino
11
0
0
29 Apr 2020
Zero-shot Neural Passage Retrieval via Domain-targeted Synthetic
  Question Generation
Zero-shot Neural Passage Retrieval via Domain-targeted Synthetic Question Generation
Ji Ma
I. Korotkov
Yinfei Yang
Keith B. Hall
Ryan T. McDonald
RALM
25
32
0
29 Apr 2020
Towards Reasonably-Sized Character-Level Transformer NMT by Finetuning
  Subword Systems
Towards Reasonably-Sized Character-Level Transformer NMT by Finetuning Subword Systems
Jindrich Libovický
Alexander Fraser
16
0
0
29 Apr 2020
Adversarial Subword Regularization for Robust Neural Machine Translation
Adversarial Subword Regularization for Robust Neural Machine Translation
Jungsoo Park
Mujeen Sung
Jinhyuk Lee
Jaewoo Kang
22
8
0
29 Apr 2020
Multiresolution and Multimodal Speech Recognition with Transformers
Multiresolution and Multimodal Speech Recognition with Transformers
Georgios Paraskevopoulos
Srinivas Parthasarathy
Aparna Khare
Shiva Sundaram
25
29
0
29 Apr 2020
Fast and Memory-Efficient Neural Code Completion
Fast and Memory-Efficient Neural Code Completion
Alexey Svyatkovskiy
Sebastian Lee
A. Hadjitofi
M. Riechert
Juliana Franco
Miltiadis Allamanis
6
91
0
28 Apr 2020
Curriculum Pre-training for End-to-End Speech Translation
Curriculum Pre-training for End-to-End Speech Translation
Chengyi Wang
Yu Wu
Shujie Liu
Ming Zhou
Zhenglu Yang
21
108
0
21 Apr 2020
Adversarial Training for Large Neural Language Models
Adversarial Training for Large Neural Language Models
Xiaodong Liu
Hao Cheng
Pengcheng He
Weizhu Chen
Yu-Chiang Frank Wang
Hoifung Poon
Jianfeng Gao
AAML
34
183
0
20 Apr 2020
Enriching the Transformer with Linguistic Factors for Low-Resource
  Machine Translation
Enriching the Transformer with Linguistic Factors for Low-Resource Machine Translation
Jordi Armengol-Estapé
Marta R. Costa-jussà
Carlos Escolano
15
7
0
17 Apr 2020
Cross-lingual Contextualized Topic Models with Zero-shot Learning
Cross-lingual Contextualized Topic Models with Zero-shot Learning
Federico Bianchi
Silvia Terragni
Dirk Hovy
Debora Nozza
Elisabetta Fersini
BDL
34
145
0
16 Apr 2020
Non-Autoregressive Machine Translation with Latent Alignments
Non-Autoregressive Machine Translation with Latent Alignments
Chitwan Saharia
William Chan
Saurabh Saxena
Mohammad Norouzi
19
157
0
16 Apr 2020
Analyzing analytical methods: The case of phonology in neural models of
  spoken language
Analyzing analytical methods: The case of phonology in neural models of spoken language
Grzegorz Chrupała
Bertrand Higy
A. Alishahi
21
20
0
15 Apr 2020
Balancing Training for Multilingual Neural Machine Translation
Balancing Training for Multilingual Neural Machine Translation
Xinyi Wang
Yulia Tsvetkov
Graham Neubig
23
99
0
14 Apr 2020
CLUE: A Chinese Language Understanding Evaluation Benchmark
CLUE: A Chinese Language Understanding Evaluation Benchmark
Liang Xu
Hai Hu
Xuanwei Zhang
Lu Li
Chenjie Cao
...
Cong Yue
Xinrui Zhang
Zhen-Yi Yang
Kyle Richardson
Zhenzhong Lan
ELM
45
379
0
13 Apr 2020
On the Language Neutrality of Pre-trained Multilingual Representations
On the Language Neutrality of Pre-trained Multilingual Representations
Jindrich Libovický
Rudolf Rosa
Alexander Fraser
25
101
0
09 Apr 2020
Translation Artifacts in Cross-lingual Transfer Learning
Translation Artifacts in Cross-lingual Transfer Learning
Mikel Artetxe
Gorka Labaka
Eneko Agirre
27
115
0
09 Apr 2020
Transfer learning and subword sampling for asymmetric-resource
  one-to-many neural translation
Transfer learning and subword sampling for asymmetric-resource one-to-many neural translation
Stig-Arne Gronroos
Sami Virpioja
M. Kurimo
28
6
0
08 Apr 2020
Learning Discrete Structured Representations by Adversarially Maximizing
  Mutual Information
Learning Discrete Structured Representations by Adversarially Maximizing Mutual Information
K. Stratos
Sam Wiseman
SSL
DRL
11
7
0
08 Apr 2020
Byte Pair Encoding is Suboptimal for Language Model Pretraining
Byte Pair Encoding is Suboptimal for Language Model Pretraining
Kaj Bostrom
Greg Durrett
28
200
0
07 Apr 2020
KorNLI and KorSTS: New Benchmark Datasets for Korean Natural Language
  Understanding
KorNLI and KorSTS: New Benchmark Datasets for Korean Natural Language Understanding
Jiyeon Ham
Yo Joong Choe
Kyubyong Park
Ilji Choi
Hyungjoon Soh
19
78
0
07 Apr 2020
Improving Fluency of Non-Autoregressive Machine Translation
Improving Fluency of Non-Autoregressive Machine Translation
Zdeněk Kasner
Jindrich Libovický
Jindvrich Helcl
AI4CE
14
8
0
07 Apr 2020
MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices
MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices
Zhiqing Sun
Hongkun Yu
Xiaodan Song
Renjie Liu
Yiming Yang
Denny Zhou
MQ
27
797
0
06 Apr 2020
Meta-Learning for Few-Shot NMT Adaptation
Meta-Learning for Few-Shot NMT Adaptation
Amr Sharaf
Hany Hassan
Hal Daumé
19
35
0
06 Apr 2020
Machine Translation Pre-training for Data-to-Text Generation -- A Case
  Study in Czech
Machine Translation Pre-training for Data-to-Text Generation -- A Case Study in Czech
Mihir Kale
Scott Roy
14
14
0
05 Apr 2020
Testing pre-trained Transformer models for Lithuanian news clustering
Testing pre-trained Transformer models for Lithuanian news clustering
Lukas Stankevicius
M. Lukoševičius
VLM
25
8
0
03 Apr 2020
The RWTH ASR System for TED-LIUM Release 2: Improving Hybrid HMM with
  SpecAugment
The RWTH ASR System for TED-LIUM Release 2: Improving Hybrid HMM with SpecAugment
Wei Zhou
Wilfried Michel
Kazuki Irie
M. Kitza
Ralf Schluter
Hermann Ney
11
42
0
02 Apr 2020
How Furiously Can Colourless Green Ideas Sleep? Sentence Acceptability
  in Context
How Furiously Can Colourless Green Ideas Sleep? Sentence Acceptability in Context
Jey Han Lau
C. S. Armendariz
Shalom Lappin
Matthew Purver
Chang Shu
14
40
0
02 Apr 2020
Low Resource Neural Machine Translation: A Benchmark for Five African
  Languages
Low Resource Neural Machine Translation: A Benchmark for Five African Languages
Surafel Melaku Lakew
Matteo Negri
Marco Turchi
AIMat
17
27
0
31 Mar 2020
High Performance Sequence-to-Sequence Model for Streaming Speech
  Recognition
High Performance Sequence-to-Sequence Model for Streaming Speech Recognition
T. Nguyen
Ngoc-Quan Pham
S. Stueker
A. Waibel
11
7
0
22 Mar 2020
TNT-KID: Transformer-based Neural Tagger for Keyword Identification
TNT-KID: Transformer-based Neural Tagger for Keyword Identification
Matej Martinc
Blaž Škrlj
Senja Pollak
24
38
0
20 Mar 2020
Enhancing Factual Consistency of Abstractive Summarization
Enhancing Factual Consistency of Abstractive Summarization
Chenguang Zhu
William Fu-Hinthorn
Ruochen Xu
Qingkai Zeng
Michael Zeng
Xuedong Huang
Meng Jiang
HILM
KELM
193
40
0
19 Mar 2020
Document Ranking with a Pretrained Sequence-to-Sequence Model
Document Ranking with a Pretrained Sequence-to-Sequence Model
Rodrigo Nogueira
Zhiying Jiang
Jimmy J. Lin
31
560
0
14 Mar 2020
Video Caption Dataset for Describing Human Actions in Japanese
Video Caption Dataset for Describing Human Actions in Japanese
Yutaro Shigeto
Yuya Yoshikawa
Jiaqing Lin
A. Takeuchi
20
3
0
10 Mar 2020
Morfessor EM+Prune: Improved Subword Segmentation with Expectation
  Maximization and Pruning
Morfessor EM+Prune: Improved Subword Segmentation with Expectation Maximization and Pruning
Stig-Arne Gronroos
Sami Virpioja
M. Kurimo
VLM
39
21
0
06 Mar 2020
What the [MASK]? Making Sense of Language-Specific BERT Models
What the [MASK]? Making Sense of Language-Specific BERT Models
Debora Nozza
Federico Bianchi
Dirk Hovy
92
106
0
05 Mar 2020
PhoBERT: Pre-trained language models for Vietnamese
PhoBERT: Pre-trained language models for Vietnamese
Dat Quoc Nguyen
A. Nguyen
174
343
0
02 Mar 2020
SkinAugment: Auto-Encoding Speaker Conversions for Automatic Speech
  Translation
SkinAugment: Auto-Encoding Speaker Conversions for Automatic Speech Translation
Arya D. McCarthy
Liezl Puzon
J. Pino
33
24
0
27 Feb 2020
Language-Independent Tokenisation Rivals Language-Specific Tokenisation
  for Word Similarity Prediction
Language-Independent Tokenisation Rivals Language-Specific Tokenisation for Word Similarity Prediction
Danushka Bollegala
Ryuichi Kiryo
K. Tsujino
Haruki Yukawa
16
7
0
25 Feb 2020
Semi-Supervised Speech Recognition via Local Prior Matching
Semi-Supervised Speech Recognition via Local Prior Matching
Wei-Ning Hsu
Ann Lee
Gabriel Synnaeve
Awni Y. Hannun
SSL
27
31
0
24 Feb 2020
Previous
123...3536373839
Next