ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.13820
  4. Cited By
An Open Dataset and Model for Language Identification

An Open Dataset and Model for Language Identification

23 May 2023
Laurie Burchell
Alexandra Birch
Nikolay Bogoychev
Kenneth Heafield
ArXivPDFHTML

Papers citing "An Open Dataset and Model for Language Identification"

27 / 27 papers shown
Title
Unveiling Language-Specific Features in Large Language Models via Sparse Autoencoders
Unveiling Language-Specific Features in Large Language Models via Sparse Autoencoders
Boyi Deng
Boyi Deng
Yidan Zhang
Baosong Yang
Fuli Feng
46
0
0
08 May 2025
Improving Informally Romanized Language Identification
Improving Informally Romanized Language Identification
Adrian Benton
Alexander Gutkin
Christo Kirov
Brian Roark
55
0
0
30 Apr 2025
Improving Multilingual Retrieval-Augmented Language Models through Dialectic Reasoning Argumentations
Improving Multilingual Retrieval-Augmented Language Models through Dialectic Reasoning Argumentations
Leonardo Ranaldi
Federico Ranaldi
Fabio Massimo Zanzotto
Barry Haddow
Alexandra Birch
RALM
LRM
43
0
0
07 Apr 2025
Multilingual Retrieval-Augmented Generation for Knowledge-Intensive Task
Multilingual Retrieval-Augmented Generation for Knowledge-Intensive Task
Leonardo Ranaldi
Barry Haddow
Alexandra Birch
RALM
71
1
0
04 Apr 2025
Register Always Matters: Analysis of LLM Pretraining Data Through the Lens of Language Variation
Register Always Matters: Analysis of LLM Pretraining Data Through the Lens of Language Variation
A. Myntti
Erik Henriksson
Veronika Laippala
S. Pyysalo
38
0
0
02 Apr 2025
An Expanded Massive Multilingual Dataset for High-Performance Language Technologies
Laurie Burchell
Ona de Gibert
Nikolay Arefyev
Mikko Aulamo
Marta Bañón
...
Pavel Stepachev
and Jörg Tiedemann
Dušan Variš
Tereza Vojtěchová
Jaume Zaragoza-Bernabeu
43
2
0
13 Mar 2025
KréyoLID From Language Identification Towards Language Mining
Rasul Dent
Pedro Ortiz Suarez
Thibault Clérice
Benoît Sagot
53
0
0
09 Mar 2025
Multi-label Scandinavian Language Identification (SLIDE)
Multi-label Scandinavian Language Identification (SLIDE)
Mariia Fedorova
Jonas Sebulon Frydenberg
Victoria Handford
Victoria Ovedie Chruickshank Langø
Solveig Helene Willoch
Marthe Løken Midtgaard
Yves Scherrer
Petter Mæhlum
David Samuel
59
0
0
10 Feb 2025
Language Fusion for Parameter-Efficient Cross-lingual Transfer
Language Fusion for Parameter-Efficient Cross-lingual Transfer
Philipp Borchert
Ivan Vulić
Marie-Francine Moens
Jochen De Weerdt
41
0
0
12 Jan 2025
AFRIDOC-MT: Document-level MT Corpus for African Languages
AFRIDOC-MT: Document-level MT Corpus for African Languages
Jesujoba Oluwadara Alabi
Israel Abebe Azime
Miaoran Zhang
C. España-Bonet
Rachel Bawden
...
Shamsuddeen Hassan Muhammad
Neo Putini
David O. Ademuyiwa
Andrew Caines
Dietrich Klakow
39
0
0
10 Jan 2025
GlotCC: An Open Broad-Coverage CommonCrawl Corpus and Pipeline for Minority Languages
GlotCC: An Open Broad-Coverage CommonCrawl Corpus and Pipeline for Minority Languages
Amir Hossein Kargaran
François Yvon
Hinrich Schutze
VLM
49
5
0
31 Oct 2024
From N-grams to Pre-trained Multilingual Models For Language
  Identification
From N-grams to Pre-trained Multilingual Models For Language Identification
Thapelo Sindane
Vukosi Marivate
29
1
0
11 Oct 2024
mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus
mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus
Matthieu Futeral
A. Zebaze
Pedro Ortiz Suarez
Julien Abadji
Rémi Lacroix
Cordelia Schmid
Rachel Bawden
Benoît Sagot
41
3
0
13 Jun 2024
MaskLID: Code-Switching Language Identification through Iterative
  Masking
MaskLID: Code-Switching Language Identification through Iterative Masking
Amir Hossein Kargaran
François Yvon
Hinrich Schütze
37
2
0
10 Jun 2024
FAME-MT Dataset: Formality Awareness Made Easy for Machine Translation
  Purposes
FAME-MT Dataset: Formality Awareness Made Easy for Machine Translation Purposes
Dawid Wi'sniewski
Zofia Rostek
Artur Nowakowski
47
0
0
20 May 2024
Chasing COMET: Leveraging Minimum Bayes Risk Decoding for Self-Improving
  Machine Translation
Chasing COMET: Leveraging Minimum Bayes Risk Decoding for Self-Improving Machine Translation
Kamil Guttmann
Miko Pokrywka
Adrian Charkiewicz
Artur Nowakowski
58
3
0
20 May 2024
The Power of Question Translation Training in Multilingual Reasoning:
  Broadened Scope and Deepened Insights
The Power of Question Translation Training in Multilingual Reasoning: Broadened Scope and Deepened Insights
Wenhao Zhu
Shujian Huang
Fei Yuan
Cheng Chen
Jiajun Chen
Alexandra Birch
LRM
52
5
0
02 May 2024
FastSpell: the LangId Magic Spell
FastSpell: the LangId Magic Spell
Marta Bañón
Jaume Zaragoza-Bernabeu
Gema Ramírez-Sánchez
Sergio Ortiz-Rojas
40
2
0
12 Apr 2024
Geographically-Informed Language Identification
Geographically-Informed Language Identification
Jonathan Dunn
Lane Edwards-Brown
21
2
0
14 Mar 2024
The Hidden Space of Transformer Language Adapters
The Hidden Space of Transformer Language Adapters
Jesujoba Oluwadara Alabi
Marius Mosbach
Matan Eyal
Dietrich Klakow
Mor Geva
59
8
1
20 Feb 2024
Code-Switched Language Identification is Harder Than You Think
Code-Switched Language Identification is Harder Than You Think
Laurie Burchell
Alexandra Birch
Robert P. Thompson
Kenneth Heafield
32
0
0
02 Feb 2024
Turning English-centric LLMs Into Polyglots: How Much Multilinguality Is
  Needed?
Turning English-centric LLMs Into Polyglots: How Much Multilinguality Is Needed?
Tannon Kew
Florian Schottmann
Rico Sennrich
LRM
34
36
0
20 Dec 2023
Fumbling in Babel: An Investigation into ChatGPT's Language
  Identification Ability
Fumbling in Babel: An Investigation into ChatGPT's Language Identification Ability
Wei-Rui Chen
Ife Adebara
Khai Duy Doan
Qisheng Liao
Muhammad Abdul-Mageed
30
5
0
16 Nov 2023
GlotLID: Language Identification for Low-Resource Languages
GlotLID: Language Identification for Low-Resource Languages
Amir Hossein Kargaran
Ayyoob Imani
François Yvon
Hinrich Schütze
30
11
0
24 Oct 2023
Monolingual or Multilingual Instruction Tuning: Which Makes a Better
  Alpaca
Monolingual or Multilingual Instruction Tuning: Which Makes a Better Alpaca
Pinzhen Chen
Shaoxiong Ji
Nikolay Bogoychev
Andrey Kutuzov
Barry Haddow
Kenneth Heafield
46
45
0
16 Sep 2023
Mitigating Hallucinations and Off-target Machine Translation with
  Source-Contrastive and Language-Contrastive Decoding
Mitigating Hallucinations and Off-target Machine Translation with Source-Contrastive and Language-Contrastive Decoding
Rico Sennrich
Jannis Vamvas
Alireza Mohammadshahi
HILM
37
39
0
13 Sep 2023
LIMIT: Language Identification, Misidentification, and Translation using
  Hierarchical Models in 350+ Languages
LIMIT: Language Identification, Misidentification, and Translation using Hierarchical Models in 350+ Languages
M. Agarwal
Md Mahfuz Ibn Alam
Antonios Anastasopoulos
38
5
0
23 May 2023
1