Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2303.18121
Cited By
BERTino: an Italian DistilBERT model
31 March 2023
Matteo Muffo
E. Bertino
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"BERTino: an Italian DistilBERT model"
14 / 14 papers shown
Title
MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices
Zhiqing Sun
Hongkun Yu
Xiaodan Song
Renjie Liu
Yiming Yang
Denny Zhou
MQ
118
817
0
06 Apr 2020
Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning
Mitchell A. Gordon
Kevin Duh
Nicholas Andrews
VLM
67
343
0
19 Feb 2020
BERTje: A Dutch BERT Model
Wietse de Vries
Andreas van Cranenburgh
Arianna Bisazza
Tommaso Caselli
Gertjan van Noord
Malvina Nissim
VLM
SSeg
78
296
0
19 Dec 2019
Multilingual is not enough: BERT for Finnish
Antti Virtanen
Jenna Kanerva
Rami Ilo
Jouni Luoma
Juhani Luotolahti
T. Salakoski
Filip Ginter
S. Pyysalo
83
281
0
15 Dec 2019
CamemBERT: a Tasty French Language Model
Louis Martin
Benjamin Muller
Pedro Ortiz Suarez
Yoann Dupont
Laurent Romary
Eric Villemonte de la Clergerie
Djamé Seddah
Benoît Sagot
126
976
0
10 Nov 2019
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Victor Sanh
Lysandre Debut
Julien Chaumond
Thomas Wolf
257
7,554
0
02 Oct 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
1.8K
95,229
0
11 Oct 2018
Deep contextualized word representations
Matthew E. Peters
Mark Neumann
Mohit Iyyer
Matt Gardner
Christopher Clark
Kenton Lee
Luke Zettlemoyer
NAI
233
11,565
0
15 Feb 2018
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
805
132,725
0
12 Jun 2017
Unsupervised Learning of Sentence Embeddings using Compositional n-Gram Features
Matteo Pagliardini
Prakhar Gupta
Martin Jaggi
SSL
184
697
0
07 Mar 2017
Enriching Word Vectors with Subword Information
Piotr Bojanowski
Edouard Grave
Armand Joulin
Tomas Mikolov
NAI
SSL
VLM
234
9,986
0
15 Jul 2016
Distilling the Knowledge in a Neural Network
Geoffrey E. Hinton
Oriol Vinyals
J. Dean
FedML
367
19,745
0
09 Mar 2015
Adam: A Method for Stochastic Optimization
Diederik P. Kingma
Jimmy Ba
ODL
2.1K
150,364
0
22 Dec 2014
Efficient Estimation of Word Representations in Vector Space
Tomas Mikolov
Kai Chen
G. Corrado
J. Dean
3DV
693
31,553
0
16 Jan 2013
1