ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2101.00403
  4. Cited By
Superbizarre Is Not Superb: Derivational Morphology Improves BERT's
  Interpretation of Complex Words

Superbizarre Is Not Superb: Derivational Morphology Improves BERT's Interpretation of Complex Words

2 January 2021
Valentin Hofmann
J. Pierrehumbert
Hinrich Schütze
ArXivPDFHTML

Papers citing "Superbizarre Is Not Superb: Derivational Morphology Improves BERT's Interpretation of Complex Words"

46 / 46 papers shown
Title
SuperBPE: Space Travel for Language Models
SuperBPE: Space Travel for Language Models
Alisa Liu
J. Hayase
Valentin Hofmann
Sewoong Oh
Noah A. Smith
Yejin Choi
53
3
0
17 Mar 2025
Model Decides How to Tokenize: Adaptive DNA Sequence Tokenization with
  MxDNA
Model Decides How to Tokenize: Adaptive DNA Sequence Tokenization with MxDNA
Lifeng Qiao
Peng Ye
Yuchen Ren
Weiqiang Bai
Chaoqi Liang
Xinzhu Ma
Nanqing Dong
W. Ouyang
99
2
0
18 Dec 2024
Evaluating Morphological Compositional Generalization in Large Language Models
Evaluating Morphological Compositional Generalization in Large Language Models
Mete Ismayilzada
Yuan Chiang
Jonne Sälevä
Hale Sirin
Abdullatif Köksal
Bhuwan Dhingra
Antoine Bosselut
Lonneke van der Plas
Duygu Ataman
41
2
0
16 Oct 2024
Tokenization and Morphology in Multilingual Language Models: A
  Comparative Analysis of mT5 and ByT5
Tokenization and Morphology in Multilingual Language Models: A Comparative Analysis of mT5 and ByT5
Thao Anh Dang
Limor Raviv
Lukas Galke
27
1
0
15 Oct 2024
Morphological evaluation of subwords vocabulary used by BETO language
  model
Morphological evaluation of subwords vocabulary used by BETO language model
Óscar García-Sierra
Ana Fernández-Pampillón Cesteros
Miguel Ortega-Martín
41
0
0
03 Oct 2024
BPE Gets Picky: Efficient Vocabulary Refinement During Tokenizer
  Training
BPE Gets Picky: Efficient Vocabulary Refinement During Tokenizer Training
Pavel Chizhov
Catherine Arnett
Elizaveta Korotkova
Ivan P. Yamshchikov
50
2
0
06 Sep 2024
Latin Treebanks in Review: An Evaluation of Morphological Tagging Across
  Time
Latin Treebanks in Review: An Evaluation of Morphological Tagging Across Time
Marisa Hudspeth
Brendan O’Connor
Laure Thompson
41
1
0
13 Aug 2024
Unsupervised Morphological Tree Tokenizer
Unsupervised Morphological Tree Tokenizer
Qingyang Zhu
Xiang Hu
Pengyu Ji
Wei Wu
Kewei Tu
39
0
0
21 Jun 2024
HeSum: a Novel Dataset for Abstractive Text Summarization in Hebrew
HeSum: a Novel Dataset for Abstractive Text Summarization in Hebrew
Tzuf Paz-Argaman
Itai Mondshine
Asaf Achi Mordechai
Reut Tsarfaty
40
2
0
06 Jun 2024
Tokenization Matters! Degrading Large Language Models through Challenging Their Tokenization
Tokenization Matters! Degrading Large Language Models through Challenging Their Tokenization
Dixuan Wang
Yanda Li
Junyuan Jiang
Zepeng Ding
Ziqin Luo
Guochao Jiang
Jiaqing Liang
Deqing Yang
34
11
0
27 May 2024
Time Machine GPT
Time Machine GPT
Felix Drinkall
Eghbal Rahimikia
J. Pierrehumbert
Stefan Zohren
AI4TS
AI4CE
KELM
SyDa
44
3
0
29 Apr 2024
Evaluating Subword Tokenization: Alien Subword Composition and OOV
  Generalization Challenge
Evaluating Subword Tokenization: Alien Subword Composition and OOV Generalization Challenge
Khuyagbaatar Batsuren
Ekaterina Vylomova
Verna Dankers
Tsetsuukhei Delgerbaatar
Omri Uzan
Yuval Pinter
Gábor Bella
42
10
0
20 Apr 2024
A Morphology-Based Investigation of Positional Encodings
A Morphology-Based Investigation of Positional Encodings
Poulami Ghosh
Shikhar Vashishth
Raj Dabre
Pushpak Bhattacharyya
34
1
0
06 Apr 2024
Verbing Weirds Language (Models): Evaluation of English Zero-Derivation
  in Five LLMs
Verbing Weirds Language (Models): Evaluation of English Zero-Derivation in Five LLMs
David R. Mortensen
Valentina Izrailevitch
Yunze Xiao
Hinrich Schütze
Leonie Weissweiler
20
5
0
26 Mar 2024
Different Tokenization Schemes Lead to Comparable Performance in Spanish
  Number Agreement
Different Tokenization Schemes Lead to Comparable Performance in Spanish Number Agreement
Catherine Arnett
Pamela D. Rivière
Tyler A. Chang
Sean Trott
29
2
0
20 Mar 2024
Unpacking Tokenization: Evaluating Text Compression and its Correlation
  with Model Performance
Unpacking Tokenization: Evaluating Text Compression and its Correlation with Model Performance
Omer Goldman
Avi Caciularu
Matan Eyal
Kris Cao
Idan Szpektor
Reut Tsarfaty
51
23
0
10 Mar 2024
Greed is All You Need: An Evaluation of Tokenizer Inference Methods
Greed is All You Need: An Evaluation of Tokenizer Inference Methods
Omri Uzan
Craig W. Schmidt
Chris Tanner
Yuval Pinter
51
14
0
02 Mar 2024
Tokenization Is More Than Compression
Tokenization Is More Than Compression
Craig W. Schmidt
Varshini Reddy
Haoran Zhang
Alec Alameddine
Omri Uzan
Yuval Pinter
Chris Tanner
61
28
0
28 Feb 2024
The Impact of Word Splitting on the Semantic Content of Contextualized
  Word Representations
The Impact of Word Splitting on the Semantic Content of Contextualized Word Representations
Aina Garí Soler
Matthieu Labeau
Chloé Clavel
VLM
47
2
0
22 Feb 2024
DrBenchmark: A Large Language Understanding Evaluation Benchmark for
  French Biomedical Domain
DrBenchmark: A Large Language Understanding Evaluation Benchmark for French Biomedical Domain
Yanis Labrak
Adrien Bazoge
Oumaima El Khettari
Mickael Rouvier
Pacome Constant dit Beaufils
...
B. Daille
Solen Quiniou
Emmanuel Morin
P. Gourraud
Richard Dufour
LM&MA
34
6
0
20 Feb 2024
Paloma: A Benchmark for Evaluating Language Model Fit
Paloma: A Benchmark for Evaluating Language Model Fit
Ian H. Magnusson
Akshita Bhagia
Valentin Hofmann
Luca Soldaini
A. Jha
...
Iz Beltagy
Hanna Hajishirzi
Noah A. Smith
Kyle Richardson
Jesse Dodge
140
21
0
16 Dec 2023
Impact of Tokenization on LLaMa Russian Adaptation
Impact of Tokenization on LLaMa Russian Adaptation
Mikhail Tikhomirov
D. Chernyshev
35
4
0
05 Dec 2023
Explicit Morphological Knowledge Improves Pre-training of Language
  Models for Hebrew
Explicit Morphological Knowledge Improves Pre-training of Language Models for Hebrew
Eylon Gueta
Omer Goldman
Reut Tsarfaty
24
1
0
01 Nov 2023
BERTwich: Extending BERT's Capabilities to Model Dialectal and Noisy
  Text
BERTwich: Extending BERT's Capabilities to Model Dialectal and Noisy Text
Aarohi Srivastava
David Chiang
30
6
0
31 Oct 2023
Counting the Bugs in ChatGPT's Wugs: A Multilingual Investigation into
  the Morphological Capabilities of a Large Language Model
Counting the Bugs in ChatGPT's Wugs: A Multilingual Investigation into the Morphological Capabilities of a Large Language Model
Leonie Weissweiler
Valentin Hofmann
Anjali Kantharuban
Anna Cai
Ritam Dutt
...
Abhishek Vijayakumar
Haofei Yu
Hinrich Schütze
Kemal Oflazer
David R. Mortensen
38
10
0
23 Oct 2023
Analyzing Cognitive Plausibility of Subword Tokenization
Analyzing Cognitive Plausibility of Subword Tokenization
Lisa Beinborn
Yuval Pinter
29
17
0
20 Oct 2023
Sentence Embedding Models for Ancient Greek Using Multilingual Knowledge
  Distillation
Sentence Embedding Models for Ancient Greek Using Multilingual Knowledge Distillation
Kevin Krahn
D. Tate
Andrew C. Lamicela
25
4
0
24 Aug 2023
Make Text Unlearnable: Exploiting Effective Patterns to Protect Personal
  Data
Make Text Unlearnable: Exploiting Effective Patterns to Protect Personal Data
Xinzhe Li
Ming Liu
Shang Gao
MU
53
8
0
02 Jul 2023
Biomedical Language Models are Robust to Sub-optimal Tokenization
Biomedical Language Models are Robust to Sub-optimal Tokenization
Bernal Jiménez Gutiérrez
Huan Sun
Yu-Chuan Su
22
6
0
30 Jun 2023
CompoundPiece: Evaluating and Improving Decompounding Performance of
  Language Models
CompoundPiece: Evaluating and Improving Decompounding Performance of Language Models
Benjamin Minixhofer
Jonas Pfeiffer
Ivan Vulić
37
6
0
23 May 2023
Language Model Tokenizers Introduce Unfairness Between Languages
Language Model Tokenizers Introduce Unfairness Between Languages
Aleksandar Petrov
Emanuele La Malfa
Philip Torr
Adel Bibi
52
98
0
17 May 2023
Effects of sub-word segmentation on performance of transformer language
  models
Effects of sub-word segmentation on performance of transformer language models
Jue Hou
Anisia Katinskaia
Anh Vu
R. Yangarber
21
4
0
09 May 2023
What do Large Language Models Learn beyond Language?
What do Large Language Models Learn beyond Language?
Avinash Madasu
Shashank Srivastava
LRM
AI4CE
44
5
0
21 Oct 2022
Incorporating Context into Subword Vocabularies
Incorporating Context into Subword Vocabularies
Shaked Yehezkel
Yuval Pinter
47
8
0
13 Oct 2022
State-of-the-art generalisation research in NLP: A taxonomy and review
State-of-the-art generalisation research in NLP: A taxonomy and review
Dieuwke Hupkes
Mario Giulianelli
Verna Dankers
Mikel Artetxe
Yanai Elazar
...
Leila Khalatbari
Maria Ryskina
Rita Frieske
Ryan Cotterell
Zhijing Jin
129
95
0
06 Oct 2022
Linguistically inspired roadmap for building biologically reliable
  protein language models
Linguistically inspired roadmap for building biologically reliable protein language models
Mai Ha Vu
Rahmad Akbar
Philippe A. Robert
B. Swiatczak
Victor Greiff
G. K. Sandve
Dag Trygve Tryslew Haug
52
35
0
03 Jul 2022
How Adults Understand What Young Children Say
How Adults Understand What Young Children Say
Stephan C. Meylan
Ruthe Foushee
Nicole H. L. Wong
Elika Bergelson
R. Levy
11
4
0
15 Jun 2022
Improving Tokenisation by Alternative Treatment of Spaces
Improving Tokenisation by Alternative Treatment of Spaces
Edward Gow-Smith
Harish Tayyar Madabushi
Carolina Scarton
Aline Villavicencio
37
20
0
08 Apr 2022
Morphological Processing of Low-Resource Languages: Where We Are and
  What's Next
Morphological Processing of Low-Resource Languages: Where We Are and What's Next
Adam Wiemerslage
Miikka Silfverberg
Changbing Yang
Arya D. McCarthy
Garrett Nicolai
Eliana Colunga
Katharina Kann
36
12
0
16 Mar 2022
Signal in Noise: Exploring Meaning Encoded in Random Character Sequences
  with Character-Aware Language Models
Signal in Noise: Exploring Meaning Encoded in Random Character Sequences with Character-Aware Language Models
Mark Chu
Bhargav Srinivasa Desikan
E. Nadler
Ruggerio L. Sardo
Elise Darragh-Ford
Douglas Guilbeault
25
0
0
15 Mar 2022
Morphology Without Borders: Clause-Level Morphology
Morphology Without Borders: Clause-Level Morphology
Omer Goldman
Reut Tsarfaty
AILaw
49
3
0
25 Feb 2022
Between words and characters: A Brief History of Open-Vocabulary
  Modeling and Tokenization in NLP
Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP
Sabrina J. Mielke
Zaid Alyafeai
Elizabeth Salesky
Colin Raffel
Manan Dey
...
Arun Raja
Chenglei Si
Wilson Y. Lee
Benoît Sagot
Samson Tan
34
143
0
20 Dec 2021
Efficient Intent Detection with Dual Sentence Encoders
Efficient Intent Detection with Dual Sentence Encoders
I. Casanueva
Tadas Temvcinas
D. Gerz
Matthew Henderson
Ivan Vulić
VLM
180
454
0
10 Mar 2020
Probabilistic FastText for Multi-Sense Word Embeddings
Probabilistic FastText for Multi-Sense Word Embeddings
Ben Athiwaratkun
A. Wilson
Anima Anandkumar
34
137
0
07 Jun 2018
Google's Neural Machine Translation System: Bridging the Gap between
  Human and Machine Translation
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Yonghui Wu
M. Schuster
Zhehuai Chen
Quoc V. Le
Mohammad Norouzi
...
Alex Rudnick
Oriol Vinyals
G. Corrado
Macduff Hughes
J. Dean
AIMat
718
6,750
0
26 Sep 2016
Efficient Estimation of Word Representations in Vector Space
Efficient Estimation of Word Representations in Vector Space
Tomas Mikolov
Kai Chen
G. Corrado
J. Dean
3DV
322
31,297
0
16 Jan 2013
1