CharacterBERT: Reconciling ELMo and BERT for Word-Level Open-Vocabulary Representations From Characters

20 October 2020

Papers citing "CharacterBERT: Reconciling ELMo and BERT for Word-Level Open-Vocabulary Representations From Characters"

30 / 30 papers shown

Title
We're Calling an Intervention: Exploring Fundamental Hurdles in Adapting Language Models to Nonstandard Text Aarohi Srivastava David Chiang 57 0 0 10 Apr 2024
The Impact of Word Splitting on the Semantic Content of Contextualized Word Representations Aina Garí Soler Matthieu Labeau Chloé Clavel VLM 34 2 0 22 Feb 2024
Make Text Unlearnable: Exploiting Effective Patterns to Protect Personal Data Xinzhe Li Ming Liu Shang Gao MU 25 8 0 02 Jul 2023
Does Manipulating Tokenization Aid Cross-Lingual Transfer? A Study on POS Tagging for Non-Standardized Languages Verena Blaschke Hinrich Schütze Barbara Plank 34 14 0 20 Apr 2023
An Information Extraction Study: Take In Mind the Tokenization! Christos Theodoropoulos Marie-Francine Moens 21 6 0 27 Mar 2023
Inducing Character-level Structure in Subword-based Language Models with Type-level Interchange Intervention Training Jing-ling Huang Zhengxuan Wu Kyle Mahowald Christopher Potts 24 13 0 19 Dec 2022
On the State of the Art in Authorship Attribution and Authorship Verification Jacob Tyo Bhuwan Dhingra Zachary Chase Lipton 32 22 0 14 Sep 2022
Review of Natural Language Processing in Pharmacology D. Trajanov Vangel Trajkovski Makedonka Dimitrieva Jovana Dobreva Milos Jovanovik Matej Klemen Alevs vZagar Marko Robnik-vSikonja LM&MA 21 7 0 22 Aug 2022
Cross-lingual Approaches for the Detection of Adverse Drug Reactions in German from a Patient's Perspective Lisa Raithel Philippe E. Thomas Roland Roller Oliver Sapina Sebastian Möller Pierre Zweigenbaum 16 2 0 03 Aug 2022
Language Modelling with Pixels Phillip Rust Jonas F. Lotz Emanuele Bugliarello Elizabeth Salesky Miryam de Lhoneux Desmond Elliott VLM 30 46 0 14 Jul 2022
Decorate the Examples: A Simple Method of Prompt Design for Biomedical Relation Extraction Hui-Syuan Yeh Thomas Lavergne Pierre Zweigenbaum 19 10 0 21 Apr 2022
Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech Guangyan Zhang Kaitao Song Xu Tan Daxin Tan Yuzi Yan ... G. Wang Wei Zhou Tao Qin Tan Lee Sheng Zhao SSL 20 21 0 31 Mar 2022
vTTS: visual-text to speech Yoshifumi Nakano Takaaki Saeki Shinnosuke Takamichi Katsuhito Sudoh Hiroshi Saruwatari 9 4 0 28 Mar 2022
Signal in Noise: Exploring Meaning Encoded in Random Character Sequences with Character-Aware Language Models Mark Chu Bhargav Srinivasa Desikan E. Nadler Ruggerio L. Sardo Elise Darragh-Ford Douglas Guilbeault 18 0 0 15 Mar 2022
An Ensemble of Pre-trained Transformer Models For Imbalanced Multiclass Malware Classification Ferhat Demirkiran Aykut Çayır U. Ünal Hasan Dag 30 42 0 25 Dec 2021
Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP Sabrina J. Mielke Zaid Alyafeai Elizabeth Salesky Colin Raffel Manan Dey ... Arun Raja Chenglei Si Wilson Y. Lee Benoît Sagot Samson Tan 30 140 0 20 Dec 2021
Using Distributional Principles for the Semantic Study of Contextual Language Models Olivier Ferret 17 1 0 23 Nov 2021
Switch Point biased Self-Training: Re-purposing Pretrained Models for Code-Switching P. Chopra Sai Krishna Rallabandi A. Black Khyathi Raghavi Chandu 10 6 0 01 Nov 2021
Can Character-based Language Models Improve Downstream Task Performance in Low-Resource and Noisy Language Scenarios? Arij Riabi Benoît Sagot Djamé Seddah 26 15 0 26 Oct 2021
Low Frequency Names Exhibit Bias and Overfitting in Contextualizing Language Models Robert Wolfe Aylin Caliskan 85 51 0 01 Oct 2021
BERT Cannot Align Characters Antonis Maronikolakis Philipp Dufter Hinrich Schütze 23 0 0 20 Sep 2021
How Suitable Are Subword Segmentation Strategies for Translating Non-Concatenative Morphology? Chantal Amrhein Rico Sennrich 22 13 0 02 Sep 2021
Models In a Spelling Bee: Language Models Implicitly Learn the Character Composition of Tokens Itay Itzhak Omer Levy 17 18 0 25 Aug 2021
DravidianCodeMix: Sentiment Analysis and Offensive Language Identification Dataset for Dravidian Languages in Code-Mixed Text Bharathi Raja Chakravarthi R. Priyadharshini Vigneshwaran Muralidaran Navya Jose Shardul Suryawanshi E. Sherly John P. Mccrae 17 104 0 17 Jun 2021
CodemixedNLP: An Extensible and Open NLP Toolkit for Code-Mixing Sai Muralidhar Jayanthi Kavya Nerella Khyathi Raghavi Chandu A. Black MoE 23 8 0 10 Jun 2021
IIITT@LT-EDI-EACL2021-Hope Speech Detection: There is always Hope in Transformers Karthik Puranik Adeep Hande R. Priyadharshini Sajeetha Thavareesan Bharathi Raja Chakravarthi 15 59 0 19 Apr 2021
AMMU : A Survey of Transformer-based Biomedical Pretrained Language Models Katikapalli Subramanyam Kalyan A. Rajasekharan S. Sangeetha LM&MA MedIm 18 164 0 16 Apr 2021
UniParma at SemEval-2021 Task 5: Toxic Spans Detection Using CharacterBERT and Bag-of-Words Model Akbar Karimi L. Rossi Andrea Prati 11 4 0 17 Mar 2021
CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation J. Clark Dan Garrette Iulia Turc John Wieting 27 210 0 11 Mar 2021
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Yonghui Wu M. Schuster Z. Chen Quoc V. Le Mohammad Norouzi ... Alex Rudnick Oriol Vinyals G. Corrado Macduff Hughes J. Dean AIMat 716 6,743 0 26 Sep 2016