ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.17110
  4. Cited By
Sinkhorn Distance Minimization for Knowledge Distillation

Sinkhorn Distance Minimization for Knowledge Distillation

27 February 2024
Xiao Cui
Yulei Qin
Yuting Gao
Enwei Zhang
Zihan Xu
Tong Wu
Ke Li
Xing Sun
Wen-gang Zhou
Houqiang Li
ArXivPDFHTML

Papers citing "Sinkhorn Distance Minimization for Knowledge Distillation"

8 / 8 papers shown
Title
Scaling Laws for Speculative Decoding
Scaling Laws for Speculative Decoding
Siyuan Yan
Mo Zhu
Guo-qing Jiang
Jianfei Wang
Jiaxing Chen
...
Xiang Liao
Xiao Cui
Chen Zhang
Zhuoran Song
Ran Zhu
LRM
48
0
0
08 May 2025
Multi-Level Optimal Transport for Universal Cross-Tokenizer Knowledge Distillation on Language Models
Multi-Level Optimal Transport for Universal Cross-Tokenizer Knowledge Distillation on Language Models
Xiao Cui
Mo Zhu
Yulei Qin
Liang Xie
Wengang Zhou
Yiming Li
94
4
0
19 Dec 2024
Improving Neural Cross-Lingual Summarization via Employing Optimal
  Transport Distance for Knowledge Distillation
Improving Neural Cross-Lingual Summarization via Employing Optimal Transport Distance for Knowledge Distillation
Thong Nguyen
A. Luu
72
40
0
07 Dec 2021
Multitask Prompted Training Enables Zero-Shot Task Generalization
Multitask Prompted Training Enables Zero-Shot Task Generalization
Victor Sanh
Albert Webson
Colin Raffel
Stephen H. Bach
Lintang Sutawika
...
T. Bers
Stella Biderman
Leo Gao
Thomas Wolf
Alexander M. Rush
LRM
215
1,663
0
15 Oct 2021
Distilling Linguistic Context for Language Model Compression
Distilling Linguistic Context for Language Model Compression
Geondo Park
Gyeongman Kim
Eunho Yang
48
38
0
17 Sep 2021
Learning Student-Friendly Teacher Networks for Knowledge Distillation
Learning Student-Friendly Teacher Networks for Knowledge Distillation
D. Park
Moonsu Cha
C. Jeong
Daesin Kim
Bohyung Han
121
101
0
12 Feb 2021
Training Deep Energy-Based Models with f-Divergence Minimization
Training Deep Energy-Based Models with f-Divergence Minimization
Lantao Yu
Yang Song
Jiaming Song
Stefano Ermon
182
42
0
06 Mar 2020
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
299
6,984
0
20 Apr 2018
1