Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2204.08582
Cited By
MASSIVE: A 1M-Example Multilingual Natural Language Understanding Dataset with 51 Typologically-Diverse Languages
18 April 2022
Jack G. M. FitzGerald
C. Hench
Charith Peris
Scott Mackie
Kay Rottmann
A. Sánchez
Aaron Nash
Liam Urbach
Vishesh Kakarala
Richa Singh
Swetha Ranganath
Laurie Crist
Misha Britan
Wouter Leeuwis
Gokhan Tur
Premkumar Natarajan
Re-assign community
ArXiv
PDF
HTML
Papers citing
"MASSIVE: A 1M-Example Multilingual Natural Language Understanding Dataset with 51 Typologically-Diverse Languages"
27 / 27 papers shown
Title
Survey of Abstract Meaning Representation: Then, Now, Future
Behrooz Mansouri
3DV
239
0
0
06 May 2025
Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation
Tiansheng Wen
Yifei Wang
Zequn Zeng
Zhong Peng
Yudi Su
Xinyang Liu
Bo Chen
Hongwei Liu
Stefanie Jegelka
Chenyu You
CLL
79
3
0
03 Mar 2025
ARISE: Iterative Rule Induction and Synthetic Data Generation for Text Classification
Y. Meena
Vaibhav Singh
Ayush Maheshwari
Amrith Krishna
Ganesh Ramakrishnan
AI4TS
187
0
0
09 Feb 2025
Automatic Labelling with Open-source LLMs using Dynamic Label Schema Integration
Thomas Walshe
S. Moon
Chunyang Xiao
Yawwani Gunawardana
Fran Silavong
50
2
0
21 Jan 2025
Text Clustering as Classification with LLMs
Chen Huang
Guoxiu He
44
2
0
03 Jan 2025
Speech-MASSIVE: A Multilingual Speech Dataset for SLU and Beyond
Beomseok Lee
Ioan Calapodescu
Marco Gaido
Matteo Negri
Laurent Besacier
AuLLM
39
4
0
07 Aug 2024
The Scandinavian Embedding Benchmarks: Comprehensive Assessment of Multilingual and Monolingual Text Embedding
Kenneth Enevoldsen
Márton Kardos
Niklas Muennighoff
Kristoffer Nielbo
42
9
0
04 Jun 2024
NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models
Chankyu Lee
Rajarshi Roy
Mengyao Xu
Jonathan Raiman
Mohammad Shoeybi
Bryan Catanzaro
Ming-Yu Liu
RALM
70
152
0
27 May 2024
k* Distribution: Evaluating the Latent Space of Deep Neural Networks using Local Neighborhood Analysis
Shashank Kotyan
Tatsuya Ueda
Danilo Vasconcellos Vargas
39
1
0
07 Dec 2023
Primacy Effect of ChatGPT
Yiwei Wang
Yujun Cai
Muhao Chen
Keli Zhang
Bryan Hooi
ALM
AI4MH
LRM
38
15
0
20 Oct 2023
Language Models are Universal Embedders
Xin Zhang
Zehan Li
Yanzhao Zhang
Dingkun Long
Pengjun Xie
Meishan Zhang
Min Zhang
KELM
ELM
58
6
0
12 Oct 2023
CWCL: Cross-Modal Transfer with Continuously Weighted Contrastive Loss
R. S. Srinivasa
Jaejin Cho
Chouchang Yang
Yashas Malur Saidutta
Ching Hua Lee
Yilin Shen
Hongxia Jin
VLM
38
8
0
26 Sep 2023
Multi3WOZ: A Multilingual, Multi-Domain, Multi-Parallel Dataset for Training and Evaluating Culturally Adapted Task-Oriented Dialog Systems
Songbo Hu
Han Zhou
Mete Hergul
Milan Gritta
Guchun Zhang
Ignacio Iacobacci
Ivan Vulić
Anna Korhonen
41
10
0
26 Jul 2023
mPLM-Sim: Better Cross-Lingual Similarity and Transfer in Multilingual Pretrained Language Models
Peiqin Lin
Chengzhi Hu
Zheyu Zhang
André F. T. Martins
Hinrich Schütze
37
1
0
23 May 2023
Zero-Shot End-to-End Spoken Language Understanding via Cross-Modal Selective Self-Training
Jianfeng He
Julian Salazar
Kaisheng Yao
Haoqi Li
Jason (Jinglun) Cai
VLM
17
7
0
22 May 2023
Generalized Multiple Intent Conditioned Slot Filling
Harshil Shah
Arthur Wilcke
Marius Cobzarenco
Cristian C Cobzarenco
Edward Challis
David Barber
18
0
0
18 May 2023
Measuring and Mitigating Local Instability in Deep Neural Networks
Arghya Datta
Subhrangshu Nandi
Jingcheng Xu
Greg Ver Steeg
He Xie
Anoop Kumar
Aram Galstyan
30
3
0
18 May 2023
The Interpreter Understands Your Meaning: End-to-end Spoken Language Understanding Aided by Speech Translation
Mutian He
Philip N. Garner
46
4
0
16 May 2023
DrBERT: A Robust Pre-trained Model in French for Biomedical and Clinical domains
Yanis Labrak
Adrien Bazoge
Richard Dufour
Mickael Rouvier
Emmanuel Morin
B. Daille
P. Gourraud
LM&MA
25
54
0
03 Apr 2023
RETVec: Resilient and Efficient Text Vectorizer
Elie Bursztein
Marina Zhang
Owen Vallis
Xinyu Jia
Alexey Kurakin
VLM
32
4
0
18 Feb 2023
MULTI3NLU++: A Multilingual, Multi-Intent, Multi-Domain Dataset for Natural Language Understanding in Task-Oriented Dialogue
Nikita Moghe
E. Razumovskaia
Liane Guillou
Ivan Vulić
Anna Korhonen
Alexandra Birch
45
13
0
20 Dec 2022
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
BigScience Workshop
:
Teven Le Scao
Angela Fan
Christopher Akiki
...
Zhongli Xie
Zifan Ye
M. Bras
Younes Belkada
Thomas Wolf
VLM
154
2,319
0
09 Nov 2022
Bloom Library: Multimodal Datasets in 300+ Languages for a Variety of Downstream Tasks
Colin Leong
Joshua Nemecek
Jacob Mansdorfer
Anna Filighera
A. Owodunni
Daniel Whitenack
VLM
AI4CE
51
25
0
26 Oct 2022
State-of-the-art generalisation research in NLP: A taxonomy and review
Dieuwke Hupkes
Mario Giulianelli
Verna Dankers
Mikel Artetxe
Yanai Elazar
...
Leila Khalatbari
Maria Ryskina
Rita Frieske
Ryan Cotterell
Zhijing Jin
129
95
0
06 Oct 2022
LINGUIST: Language Model Instruction Tuning to Generate Annotated Utterances for Intent Classification and Slot Tagging
Andrew Rosenbaum
Saleh Soltan
Wael Hamza
Yannick Versley
M. Boese
29
43
0
20 Sep 2022
Cross-Lingual Dialogue Dataset Creation via Outline-Based Generation
Olga Majewska
E. Razumovskaia
Edoardo Ponti
Ivan Vulić
Anna Korhonen
39
28
0
31 Jan 2022
Crossing the Conversational Chasm: A Primer on Natural Language Processing for Multilingual Task-Oriented Dialogue Systems
E. Razumovskaia
Goran Glavaš
Olga Majewska
Edoardo Ponti
Anna Korhonen
Ivan Vulić
36
32
0
17 Apr 2021
1