Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2002.10957
Cited By
MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers
25 February 2020
Wenhui Wang
Furu Wei
Li Dong
Hangbo Bao
Nan Yang
Ming Zhou
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers"
50 / 90 papers shown
Title
LookWhere? Efficient Visual Recognition by Learning Where to Look and What to See from Self-Supervision
A. Fuller
Yousef Yassin
Junfeng Wen
Daniel G. Kyrollos
Tarek Ibrahim
James R. Green
Evan Shelhamer
ViT
75
0
0
23 May 2025
Surfacing Semantic Orthogonality Across Model Safety Benchmarks: A Multi-Dimensional Analysis
Jonathan Bennion
Shaona Ghosh
Mantek Singh
Nouha Dziri
73
0
0
23 May 2025
Curriculum Guided Reinforcement Learning for Efficient Multi Hop Retrieval Augmented Generation
Yuelyu Ji
Rui Meng
Zhuochun Li
Daqing He
55
0
0
23 May 2025
Model alignment using inter-modal bridges
Ali Gholamzadeh
Noor Sajid
91
0
0
18 May 2025
MoL for LLMs: Dual-Loss Optimization to Enhance Domain Expertise While Preserving General Capabilities
Jingxue Chen
Qingkun Tang
Qianchun Lu
Siyuan Fang
50
0
0
17 May 2025
ABKD: Pursuing a Proper Allocation of the Probability Mass in Knowledge Distillation via
α
α
α
-
β
β
β
-Divergence
Guanghui Wang
Zhiyong Yang
Ziyi Wang
Shi Wang
Qianqian Xu
Qingming Huang
118
0
0
07 May 2025
A Reasoning-Focused Legal Retrieval Benchmark
Lucia Zheng
Neel Guha
Javokhir Arifov
Sarah Zhang
Michal Skreta
Christopher D. Manning
Peter Henderson
Daniel E. Ho
AILaw
RALM
ELM
148
4
0
06 May 2025
Learning Critically: Selective Self Distillation in Federated Learning on Non-IID Data
Yuting He
Yiqiang Chen
Xiaodong Yang
H. Yu
Yi-Hua Huang
Yang Gu
FedML
112
21
0
20 Apr 2025
OnRL-RAG: Real-Time Personalized Mental Health Dialogue System
Ahsan Bilal
Beiyu Lin
OffRL
RALM
AI4MH
76
1
0
02 Apr 2025
A Retrieval-Based Approach to Medical Procedure Matching in Romanian
Andrei Niculae
Adrian Cosma
Emilian Radoi
83
0
0
26 Mar 2025
ConSCompF: Consistency-focused Similarity Comparison Framework for Generative Large Language Models
Alexey Karev
Dong Xu
80
0
0
18 Mar 2025
Moving Past Single Metrics: Exploring Short-Text Clustering Across Multiple Resolutions
Justin K. Miller
Tristram J. Alexander
67
1
0
24 Feb 2025
Uncertainty-Aware Step-wise Verification with Generative Reward Models
Zihuiwen Ye
Luckeciano C. Melo
Younesse Kaddar
Phil Blunsom
Shivalika Singh
Yarin Gal
LRM
99
2
0
16 Feb 2025
Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented Generation
Mohammad Mahdi Abootorabi
Amirhosein Zobeiri
Mahdi Dehghani
Mohammadali Mohammadkhani
Bardia Mohammadi
Omid Ghahroodi
M. Baghshah
Ehsaneddin Asgari
RALM
166
6
0
12 Feb 2025
Benchmarking Prompt Sensitivity in Large Language Models
Amirhossein Razavi
Mina Soltangheis
Negar Arabzadeh
Sara Salamat
Morteza Zihayat
Ebrahim Bagheri
80
3
0
09 Feb 2025
Towards Cross-Tokenizer Distillation: the Universal Logit Distillation Loss for LLMs
Nicolas Boizard
Kevin El Haddad
C´eline Hudelot
Pierre Colombo
104
15
0
28 Jan 2025
TAD-Bench: A Comprehensive Benchmark for Embedding-Based Text Anomaly Detection
Yang Cao
Sikun Yang
Chen Li
Haolong Xiang
Lianyong Qi
Bo Liu
Rongsheng Li
Ming Liu
71
0
0
21 Jan 2025
ACORD: An Expert-Annotated Retrieval Dataset for Legal Contract Drafting
Steven H. Wang
Maksim Zubkov
Kexin Fan
Sarah Harrell
Yuyang Sun
Wei Chen
Andreas Plesner
Roger Wattenhofer
AILaw
71
2
0
11 Jan 2025
Contextual ASR Error Handling with LLMs Augmentation for Goal-Oriented Conversational AI
Yuya Asano
Sabit Hassan
P. Sharma
Anthony Sicilia
Katherine Atwell
Diane Litman
Malihe Alikhani
74
1
0
10 Jan 2025
LLM-Virus: Evolutionary Jailbreak Attack on Large Language Models
Miao Yu
Sihang Li
Yingjie Zhou
Xing Fan
Kun Wang
Shirui Pan
Qingsong Wen
AAML
90
1
0
03 Jan 2025
GASLITEing the Retrieval: Exploring Vulnerabilities in Dense Embedding-based Search
Matan Ben-Tov
Mahmood Sharif
RALM
84
1
0
31 Dec 2024
VisDoM: Multi-Document QA with Visually Rich Elements Using Multimodal Retrieval-Augmented Generation
Manan Suri
Puneet Mathur
Franck Dernoncourt
Kanika Goswami
Ryan Rossi
Dinesh Manocha
114
4
0
14 Dec 2024
FuseGPT: Learnable Layers Fusion of Generative Pre-trained Transformers
Zehua Pei
Hui-Ling Zhen
Xianzhi Yu
Sinno Jialin Pan
Mingxuan Yuan
Bei Yu
AI4CE
141
3
0
21 Nov 2024
DRPruning: Efficient Large Language Model Pruning through Distributionally Robust Optimization
Hexuan Deng
Wenxiang Jiao
Xuebo Liu
Min Zhang
Zhaopeng Tu
Zhaopeng Tu
VLM
149
0
0
21 Nov 2024
MAS-Attention: Memory-Aware Stream Processing for Attention Acceleration on Resource-Constrained Edge Devices
Mohammadali Shakerdargah
Shan Lu
Chao Gao
Di Niu
91
0
0
20 Nov 2024
SWITCH: Studying with Teacher for Knowledge Distillation of Large Language Models
Jahyun Koo
Yerin Hwang
Yongil Kim
Taegwan Kang
Hyunkyung Bae
Kyomin Jung
83
0
0
25 Oct 2024
MiniPLM: Knowledge Distillation for Pre-Training Language Models
Yuxian Gu
Hao Zhou
Fandong Meng
Jie Zhou
Minlie Huang
122
5
0
22 Oct 2024
Coherence-Driven Multimodal Safety Dialogue with Active Learning for Embodied Agents
Sabit Hassan
Hye-Young Chung
Xiang Zhi Tan
Malihe Alikhani
104
0
0
18 Oct 2024
G-Designer: Architecting Multi-agent Communication Topologies via Graph Neural Networks
Guibin Zhang
Xinfeng Li
Xiangguo Sun
Guancheng Wan
Miao Yu
Sihang Li
Kun Wang
Dawei Cheng
Dawei Cheng
AAML
AI4CE
100
11
0
15 Oct 2024
GraphCLIP: Enhancing Transferability in Graph Foundation Models for Text-Attributed Graphs
Yun Zhu
Haizhou Shi
Xiaotang Wang
Yongchao Liu
Yaoke Wang
Boci Peng
Chuntao Hong
Siliang Tang
VLM
93
10
0
14 Oct 2024
Agent-Oriented Planning in Multi-Agent Systems
Ao Li
Yuexiang Xie
Songze Li
Fugee Tsung
Bolin Ding
Yaliang Li
AIFin
270
7
0
03 Oct 2024
HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models
Seanie Lee
Haebin Seong
Dong Bok Lee
Minki Kang
Xiaoyin Chen
Dominik Wagner
Yoshua Bengio
Juho Lee
Sung Ju Hwang
85
5
0
02 Oct 2024
Enhancing Screen Time Identification in Children with a Multi-View Vision Language Model and Screen Time Tracker
Xinlong Hou
Sen Shen
Xueshen Li
Xinran Gao
Ziyi Huang
Steven J. Holiday
Matthew R. Cribbet
Susan W. White
Edward Sazonov
Yu Gan
52
0
0
02 Oct 2024
Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models
Aviv Bick
Kevin Y. Li
Eric P. Xing
J. Zico Kolter
Albert Gu
Mamba
73
27
0
19 Aug 2024
Surpassing Cosine Similarity for Multidimensional Comparisons: Dimension Insensitive Euclidean Metric
F. Tessari
Neville Hogan
Neville Hogan
69
3
0
11 Jul 2024
Direct Preference Knowledge Distillation for Large Language Models
Yixing Li
Yuxian Gu
Li Dong
Dequan Wang
Yu Cheng
Furu Wei
62
6
0
28 Jun 2024
ColPali: Efficient Document Retrieval with Vision Language Models
Manuel Faysse
Hugues Sibille
Tony Wu
Bilel Omrani
Gautier Viaud
C´eline Hudelot
Pierre Colombo
VLM
126
23
0
27 Jun 2024
From Instance Training to Instruction Learning: Task Adapters Generation from Instructions
Huanxuan Liao
Yao Xu
Shizhu He
Yuanzhe Zhang
Yanchao Hao
Shengping Liu
Kang Liu
Jun Zhao
92
1
0
18 Jun 2024
Hello Again! LLM-powered Personalized Agent for Long-term Dialogue
Hao Li
Chenghao Yang
An Zhang
Yang Deng
Xiang Wang
Tat-Seng Chua
LLMAG
104
27
0
09 Jun 2024
Curating corpora with classifiers: A case study of clean energy sentiment online
M. V. Arnold
P. Dodds
C. Danforth
51
0
0
04 May 2023
XtremeDistil: Multi-stage Distillation for Massive Multilingual Models
Subhabrata Mukherjee
Ahmed Hassan Awadallah
42
57
0
12 Apr 2020
UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training
Hangbo Bao
Li Dong
Furu Wei
Wenhui Wang
Nan Yang
...
Yu Wang
Songhao Piao
Jianfeng Gao
Ming Zhou
H. Hon
AI4CE
63
394
0
28 Feb 2020
BERT-of-Theseus: Compressing BERT by Progressive Module Replacing
Canwen Xu
Wangchunshu Zhou
Tao Ge
Furu Wei
Ming Zhou
252
199
0
07 Feb 2020
Unsupervised Cross-lingual Representation Learning at Scale
Alexis Conneau
Kartikay Khandelwal
Naman Goyal
Vishrav Chaudhary
Guillaume Wenzek
Francisco Guzmán
Edouard Grave
Myle Ott
Luke Zettlemoyer
Veselin Stoyanov
128
6,454
0
05 Nov 2019
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
M. Lewis
Yinhan Liu
Naman Goyal
Marjan Ghazvininejad
Abdel-rahman Mohamed
Omer Levy
Veselin Stoyanov
Luke Zettlemoyer
AIMat
VLM
108
10,720
0
29 Oct 2019
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel
Noam M. Shazeer
Adam Roberts
Katherine Lee
Sharan Narang
Michael Matena
Yanqi Zhou
Wei Li
Peter J. Liu
AIMat
247
19,824
0
23 Oct 2019
Knowledge Distillation from Internal Representations
Gustavo Aguilar
Yuan Ling
Yu Zhang
Benjamin Yao
Xing Fan
Edward Guo
49
179
0
08 Oct 2019
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Victor Sanh
Lysandre Debut
Julien Chaumond
Thomas Wolf
101
7,386
0
02 Oct 2019
TinyBERT: Distilling BERT for Natural Language Understanding
Xiaoqi Jiao
Yichun Yin
Lifeng Shang
Xin Jiang
Xiao Chen
Linlin Li
F. Wang
Qun Liu
VLM
44
1,838
0
23 Sep 2019
Addressing Semantic Drift in Question Generation for Semi-Supervised Question Answering
Shiyue Zhang
Joey Tianyi Zhou
45
140
0
13 Sep 2019
1
2
Next