Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.02769
Cited By
Learning from Teaching Regularization: Generalizable Correlations Should be Easy to Imitate
5 February 2024
Can Jin
Tong Che
Hongwu Peng
Yiyuan Li
Dimitris N. Metaxas
Marco Pavone
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Learning from Teaching Regularization: Generalizable Correlations Should be Easy to Imitate"
39 / 39 papers shown
Title
LiteCUA: Computer as MCP Server for Computer-Use Agent on AIOS
Kai Mei
Xi Zhu
Hang Gao
Shuhang Lin
Yongfeng Zhang
147
0
0
24 May 2025
Self-GIVE: Associative Thinking from Limited Structured Knowledge for Enhanced Large Language Model Reasoning
Jiashu He
Jinxuan Fan
Bowen Jiang
Ignacio Houine
Dan Roth
Alejandro Ribeiro
ReLM
RALM
LRM
80
2
0
21 May 2025
RankFlow: A Multi-Role Collaborative Reranking Workflow Utilizing Large Language Models
Can Jin
Hongwu Peng
Anxiang Zhang
Nuo Chen
Jiahui Zhao
...
Keqin Li
Shuya Feng
Kai Zhong
Caiwen Ding
Dimitris N. Metaxas
169
2
0
02 Feb 2025
APEER: Automatic Prompt Engineering Enhances Large Language Model Reranking
Can Jin
Hongwu Peng
Shiyu Zhao
Zhenting Wang
Wujiang Xu
Ligong Han
Jiahui Zhao
Kai Zhong
Sanguthevar Rajasekaran
Dimitris N. Metaxas
KELM
83
33
0
20 Jun 2024
Visual Prompting Upgrades Neural Network Sparsification: A Data-Model Perspective
Can Jin
Tianjin Huang
Yihua Zhang
Mykola Pechenizkiy
Sijia Liu
Shiwei Liu
Tianlong Chen
VLM
91
26
0
03 Dec 2023
MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning
Xiang Yue
Xingwei Qu
Ge Zhang
Yao Fu
Wenhao Huang
Huan Sun
Yu-Chuan Su
Wenhu Chen
AIMat
LRM
123
397
0
11 Sep 2023
AutoReP: Automatic ReLU Replacement for Fast Private Network Inference
Hongwu Peng
Shaoyi Huang
Tong Zhou
Yukui Luo
Chenghong Wang
...
Tony Geng
Kaleel Mahmood
Wujie Wen
Xiaolin Xu
Caiwen Ding
OffRL
73
38
0
20 Aug 2023
Decoupled Knowledge Distillation
Borui Zhao
Quan Cui
Renjie Song
Yiyu Qiu
Jiajun Liang
56
538
0
16 Mar 2022
Does Knowledge Distillation Really Work?
Samuel Stanton
Pavel Izmailov
Polina Kirichenko
Alexander A. Alemi
A. Wilson
FedML
57
220
0
10 Jun 2021
Distilling Knowledge via Knowledge Review
Pengguang Chen
Shu Liu
Hengshuang Zhao
Jiaya Jia
181
434
0
19 Apr 2021
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Ze Liu
Yutong Lin
Yue Cao
Han Hu
Yixuan Wei
Zheng Zhang
Stephen Lin
B. Guo
ViT
402
21,347
0
25 Mar 2021
ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases
Stéphane dÁscoli
Hugo Touvron
Matthew L. Leavitt
Ari S. Morcos
Giulio Biroli
Levent Sagun
ViT
110
824
0
19 Mar 2021
Learning Student-Friendly Teacher Networks for Knowledge Distillation
D. Park
Moonsu Cha
C. Jeong
Daesin Kim
Bohyung Han
158
101
0
12 Feb 2021
Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks
Torsten Hoefler
Dan Alistarh
Tal Ben-Nun
Nikoli Dryden
Alexandra Peste
MQ
283
712
0
31 Jan 2021
Training data-efficient image transformers & distillation through attention
Hugo Touvron
Matthieu Cord
Matthijs Douze
Francisco Massa
Alexandre Sablayrolles
Hervé Jégou
ViT
345
6,731
0
23 Dec 2020
Transferring Inductive Biases through Knowledge Distillation
Samira Abnar
Mostafa Dehghani
Willem H. Zuidema
63
58
0
31 May 2020
Compositionality and Generalization in Emergent Languages
Rahma Chaabouni
Eugene Kharitonov
Diane Bouchacourt
Emmanuel Dupoux
Marco Baroni
CoGe
AI4CE
68
139
0
20 Apr 2020
Ease-of-Teaching and Language Structure from Emergent Communication
Fushan Li
Michael Bowling
154
102
0
06 Jun 2019
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Zihang Dai
Zhilin Yang
Yiming Yang
J. Carbonell
Quoc V. Le
Ruslan Salakhutdinov
VLM
192
3,724
0
09 Jan 2019
SNIP: Single-shot Network Pruning based on Connection Sensitivity
Namhoon Lee
Thalaiyasingam Ajanthan
Philip Torr
VLM
231
1,196
0
04 Oct 2018
MobileNetV2: Inverted Residuals and Linear Bottlenecks
Mark Sandler
Andrew G. Howard
Menglong Zhu
A. Zhmoginov
Liang-Chieh Chen
169
19,204
0
13 Jan 2018
Regularization for Deep Learning: A Taxonomy
J. Kukačka
Vladimir Golkov
Daniel Cremers
80
335
0
29 Oct 2017
Proximal Policy Optimization Algorithms
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
444
18,931
0
20 Jul 2017
A Closer Look at Memorization in Deep Networks
Devansh Arpit
Stanislaw Jastrzebski
Nicolas Ballas
David M. Krueger
Emmanuel Bengio
...
Tegan Maharaj
Asja Fischer
Aaron Courville
Yoshua Bengio
Simon Lacoste-Julien
TDI
118
1,814
0
16 Jun 2017
Understanding deep learning requires rethinking generalization
Chiyuan Zhang
Samy Bengio
Moritz Hardt
Benjamin Recht
Oriol Vinyals
HAI
314
4,624
0
10 Nov 2016
Entropy-SGD: Biasing Gradient Descent Into Wide Valleys
Pratik Chaudhari
A. Choromańska
Stefano Soatto
Yann LeCun
Carlo Baldassi
C. Borgs
J. Chayes
Levent Sagun
R. Zecchina
ODL
94
773
0
06 Nov 2016
Pointer Sentinel Mixture Models
Stephen Merity
Caiming Xiong
James Bradbury
R. Socher
RALM
264
2,844
0
26 Sep 2016
Pruning Filters for Efficient ConvNets
Hao Li
Asim Kadav
Igor Durdanovic
H. Samet
H. Graf
3DPC
188
3,693
0
31 Aug 2016
Learning without Forgetting
Zhizhong Li
Derek Hoiem
CLL
OOD
SSL
282
4,391
0
29 Jun 2016
Learning to Communicate with Deep Multi-Agent Reinforcement Learning
Jakob N. Foerster
Yannis Assael
Nando de Freitas
Shimon Whiteson
145
1,605
0
21 May 2016
Improving Neural Machine Translation Models with Monolingual Data
Rico Sennrich
Barry Haddow
Alexandra Birch
241
2,716
0
20 Nov 2015
Policy Distillation
Andrei A. Rusu
Sergio Gomez Colmenarejo
Çağlar Gülçehre
Guillaume Desjardins
J. Kirkpatrick
Razvan Pascanu
Volodymyr Mnih
Koray Kavukcuoglu
R. Hadsell
79
690
0
19 Nov 2015
Unifying distillation and privileged information
David Lopez-Paz
Léon Bottou
Bernhard Schölkopf
V. Vapnik
FedML
146
462
0
11 Nov 2015
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
Sergey Ioffe
Christian Szegedy
OOD
419
43,234
0
11 Feb 2015
FitNets: Hints for Thin Deep Nets
Adriana Romero
Nicolas Ballas
Samira Ebrahimi Kahou
Antoine Chassang
C. Gatta
Yoshua Bengio
FedML
276
3,870
0
19 Dec 2014
Recurrent Neural Network Regularization
Wojciech Zaremba
Ilya Sutskever
Oriol Vinyals
ODL
123
2,774
0
08 Sep 2014
Do Deep Nets Really Need to be Deep?
Lei Jimmy Ba
R. Caruana
160
2,117
0
21 Dec 2013
Improving neural networks by preventing co-adaptation of feature detectors
Geoffrey E. Hinton
Nitish Srivastava
A. Krizhevsky
Ilya Sutskever
Ruslan Salakhutdinov
VLM
424
7,658
0
03 Jul 2012
L2 Regularization for Learning Kernels
Corinna Cortes
M. Mohri
Afshin Rostamizadeh
66
443
0
09 May 2012
1