Scalable Syntax-Aware Language Models Using Knowledge Distillation

14 June 2019

Papers citing "Scalable Syntax-Aware Language Models Using Knowledge Distillation"

3 / 3 papers shown

Title
Can We Use Probing to Better Understand Fine-tuning and Knowledge Distillation of the BERT NLU? Jakub Ho'scilowicz Marcin Sowanski Piotr Czubowski Artur Janicki 23 2 0 27 Jan 2023
Efficient Transformers in Reinforcement Learning using Actor-Learner Distillation Emilio Parisotto Ruslan Salakhutdinov 37 43 0 04 Apr 2021
What you can cram into a single vector: Probing sentence embeddings for linguistic properties Alexis Conneau Germán Kruszewski Guillaume Lample Loïc Barrault Marco Baroni 201 882 0 03 May 2018