BERTino: an Italian DistilBERT model

31 March 2023

Papers citing "BERTino: an Italian DistilBERT model"

14 / 14 papers shown

Title
MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices Zhiqing Sun Hongkun Yu Xiaodan Song Renjie Liu Yiming Yang Denny Zhou MQ 118 817 0 06 Apr 2020
Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning Mitchell A. Gordon Kevin Duh Nicholas Andrews VLM 67 343 0 19 Feb 2020
BERTje: A Dutch BERT Model Wietse de Vries Andreas van Cranenburgh Arianna Bisazza Tommaso Caselli Gertjan van Noord Malvina Nissim VLM SSeg 78 296 0 19 Dec 2019
Multilingual is not enough: BERT for Finnish Antti Virtanen Jenna Kanerva Rami Ilo Jouni Luoma Juhani Luotolahti T. Salakoski Filip Ginter S. Pyysalo 83 281 0 15 Dec 2019
CamemBERT: a Tasty French Language Model Louis Martin Benjamin Muller Pedro Ortiz Suarez Yoann Dupont Laurent Romary Eric Villemonte de la Clergerie Djamé Seddah Benoît Sagot 126 976 0 10 Nov 2019
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter Victor Sanh Lysandre Debut Julien Chaumond Thomas Wolf 257 7,554 0 02 Oct 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Jacob Devlin Ming-Wei Chang Kenton Lee Kristina Toutanova VLM SSL SSeg 1.8K 95,229 0 11 Oct 2018
Deep contextualized word representations Matthew E. Peters Mark Neumann Mohit Iyyer Matt Gardner Christopher Clark Kenton Lee Luke Zettlemoyer NAI 233 11,565 0 15 Feb 2018
Attention Is All You Need Ashish Vaswani Noam M. Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan Gomez Lukasz Kaiser Illia Polosukhin 3DV 805 132,725 0 12 Jun 2017
Unsupervised Learning of Sentence Embeddings using Compositional n-Gram Features Matteo Pagliardini Prakhar Gupta Martin Jaggi SSL 184 697 0 07 Mar 2017
Enriching Word Vectors with Subword Information Piotr Bojanowski Edouard Grave Armand Joulin Tomas Mikolov NAI SSL VLM 234 9,986 0 15 Jul 2016
Distilling the Knowledge in a Neural Network Geoffrey E. Hinton Oriol Vinyals J. Dean FedML 367 19,745 0 09 Mar 2015
Adam: A Method for Stochastic Optimization Diederik P. Kingma Jimmy Ba ODL 2.1K 150,364 0 22 Dec 2014
Efficient Estimation of Word Representations in Vector Space Tomas Mikolov Kai Chen G. Corrado J. Dean 3DV 693 31,553 0 16 Jan 2013