DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

2 October 2019

Papers citing "DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter"

50 / 131 papers shown

Title
Co-AttenDWG: Co-Attentive Dimension-Wise Gating and Expert Fusion for Multi-Modal Offensive Content Detection Md. Mithun Hossain Md. Shakil Hossain Sudipto Chaki M. F. Mridha 78 0 0 25 May 2025
Discretization-free Multicalibration through Loss Minimization over Tree Ensembles Hongyi Henry Jin Zijun Ding Dung Daniel Ngo Zhiwei Steven Wu 49 0 0 23 May 2025
Locality-Sensitive Hashing for Efficient Hard Negative Sampling in Contrastive Learning Fabian Deuser Philipp Hausenblas Hannah Schieber Daniel Roth Martin Werner Norbert Oswald 68 0 0 23 May 2025
On Multilingual Encoder Language Model Compression for Low-Resource Languages Daniil Gurgurov Michal Gregor Josef van Genabith Simon Ostermann 72 0 0 22 May 2025
MoL for LLMs: Dual-Loss Optimization to Enhance Domain Expertise While Preserving General Capabilities Jingxue Chen Qingkun Tang Qianchun Lu Siyuan Fang 43 0 0 17 May 2025
ADALog: Adaptive Unsupervised Anomaly detection in Logs with Self-attention Masked Language Model Przemek Pospieszny Wojciech Mormul Karolina Szyndler Sanjeev Kumar 56 0 0 15 May 2025
KDH-MLTC: Knowledge Distillation for Healthcare Multi-Label Text Classification Hajar Sakai Sarah Lam VLM 63 0 0 12 May 2025
Semantic Retention and Extreme Compression in LLMs: Can We Have Both? Stanislas Laborde Martin Cousseau Antoun Yaacoub Lionel Prevost MQ 46 0 0 12 May 2025
AI-Enabled Accurate Non-Invasive Assessment of Pulmonary Hypertension Progression via Multi-Modal Echocardiography Jiewen Yang Taoran Huang Shangwei Ding Xiaowei Xu Qinhua Zhao ... Bin Pu Jiexuan Zheng Caojin Zhang Hongwen Fei Xuelong Li 41 0 0 12 May 2025
Replay-Based Continual Learning with Dual-Layered Distillation and a Streamlined U-Net for Efficient Text-to-Image Generation Md. Naimur Asif Borno Md Sakib Hossain Shovon Asmaa Soliman Al-Moisheer Mohammad Ali Moni 57 0 0 11 May 2025
Cape: Context-Aware Prompt Perturbation Mechanism with Differential Privacy Haoqi Wu Wei Dai Li Wang Qiang Yan SILM 64 1 0 09 May 2025
Revisiting the MIMIC-IV Benchmark: Experiments Using Language Models for Electronic Health Records Jesus Lovon Thouria Ben-Haddi Jules Di Scala José G. Moreno L. Tamine 89 2 0 29 Apr 2025
Ask2Loc: Learning to Locate Instructional Visual Answers by Asking Questions Chang Zong Bin Li Shoujun Zhou Jian Wan Lei Zhang 332 0 0 22 Apr 2025
Empirical Evaluation of Knowledge Distillation from Transformers to Subquadratic Language Models Patrick Haller Jonas Golde Alan Akbik 53 0 0 19 Apr 2025
From Large to Super-Tiny: End-to-End Optimization for Cost-Efficient LLMs Jiliang Ni Jiachen Pu Zhongyi Yang Kun Zhou Hui Wang Xiaoliang Xiao Dakui Wang Xin Li Jingfeng Luo Conggang Hu 54 0 0 18 Apr 2025
Prompt Optimization with Logged Bandit Data Haruka Kiyohara Daniel Yiming Cao Yuta Saito Thorsten Joachims 119 0 0 03 Apr 2025
Catch Me if You Search: When Contextual Web Search Results Affect the Detection of Hallucinations Mahjabin Nahar Eun-Ju Lee Jin Won Park Dongwon Lee HILM 96 0 0 01 Apr 2025
Improving User Behavior Prediction: Leveraging Annotator Metadata in Supervised Machine Learning Models Lynnette Ng Kokil Jaidka Kaiyuan Tay Hansin Ahuja Niyati Chhaya 87 0 0 26 Mar 2025
A Generalist Hanabi Agent Arjun Vaithilingam Sudhakar Hadi Nekoei Mathieu Reymond Miao Liu Janarthanan Rajendran Sarath Chandar 358 0 0 17 Mar 2025
GraphEval: A Lightweight Graph-Based LLM Framework for Idea Evaluation Tao Feng Yihang Sun Jiaxuan You 98 1 0 16 Mar 2025
TikZero: Zero-Shot Text-Guided Graphics Program Synthesis Jonas Belouadi Eddy Ilg Margret Keuper Hideki Tanaka Masao Utiyama Raj Dabre Steffen Eger Simone Paolo Ponzetto 81 0 0 14 Mar 2025
Training Plug-n-Play Knowledge Modules with Deep Context Distillation Lucas Caccia Alan Ansell Edoardo Ponti Ivan Vulić Alessandro Sordoni SyDa 391 0 0 11 Mar 2025
Is My Text in Your AI Model? Gradient-based Membership Inference Test applied to LLMs Gonzalo Mancera Daniel DeAlcala Julian Fierrez Ruben Tolosana Aythami Morales 56 1 0 10 Mar 2025
Development and Enhancement of Text-to-Image Diffusion Models Rajdeep Roshan Sahu VLM 87 36 0 07 Mar 2025
The Society of HiveMind: Multi-Agent Optimization of Foundation Model Swarms to Unlock the Potential of Collective Intelligence Noah Mamie Susie Xi Rao LLMAG AI4CE 83 0 0 07 Mar 2025
Encryption-Friendly LLM Architecture Donghwan Rho Taeseong Kim Minje Park Jung Woo Kim Hyunsik Chae Jung Hee Cheon Ernest K. Ryu 103 2 0 24 Feb 2025
Fully automatic extraction of morphological traits from the Web: utopia or reality? Diego Marcos Robert van de Vlasakker Ioannis Athanasiadis P. Bonnet Hervé Goëau Alexis Joly W. Daniel Kissling César Leblanc André S. J. van Proosdij Konstantinos P. Panousis 95 3 0 24 Feb 2025
A Survey of Model Architectures in Information Retrieval Zhichao Xu Fengran Mo Zhiqi Huang Crystina Zhang Puxuan Yu Bei Wang Jimmy J. Lin Vivek Srikumar KELM 3DV 94 2 0 21 Feb 2025
SPEX: Scaling Feature Interaction Explanations for LLMs Justin Singh Kang Landon Butler Abhineet Agarwal Yigit Efe Erginbas Ramtin Pedarsani Kannan Ramchandran Bin Yu VLM LRM 102 2 0 20 Feb 2025
Prompt-based Depth Pruning of Large Language Models Juyun Wee Minjae Park Jaeho Lee VLM 118 0 0 17 Feb 2025
RideKE: Leveraging Low-Resource, User-Generated Twitter Content for Sentiment and Emotion Detection in Kenyan Code-Switched Dataset Naome A. Etori Maria Gini 118 2 0 10 Feb 2025
SMAB: MAB based word Sensitivity Estimation Framework and its Applications in Adversarial Text Generation Saurabh Kumar Pandey S. Vashistha Debrup Das Somak Aditya Monojit Choudhury AAML 95 0 0 10 Feb 2025
Few-shot LLM Synthetic Data with Distribution Matching Jiyuan Ren Zhaocheng Du Zhihao Wen Qinglin Jia Sunhao Dai Chuhan Wu Zhenhua Dong SyDa 112 0 0 09 Feb 2025
DexVLA: Vision-Language Model with Plug-In Diffusion Expert for General Robot Control Junjie Wen Yinlin Zhu Jinming Li Zhibin Tang Yaxin Peng Feifei Feng VLM 72 18 0 09 Feb 2025
Towards Cross-Tokenizer Distillation: the Universal Logit Distillation Loss for LLMs Nicolas Boizard Kevin El Haddad C´eline Hudelot Pierre Colombo 100 15 0 28 Jan 2025
Merino: Entropy-driven Design for Generative Language Models on IoT Devices Youpeng Zhao Ming Lin Huadong Tang Qiang Wu Jun Wang 93 0 0 28 Jan 2025
The Effect of Similarity Measures on Accurate Stability Estimates for Local Surrogate Models in Text-based Explainable AI Christopher Burger Charles Walter Thai Le AAML 157 1 0 20 Jan 2025
AIMA at SemEval-2024 Task 3: Simple Yet Powerful Emotion Cause Pair Analysis Alireza Ghahramani Kure Mahshid Dehghani Mohammad Mahdi Abootorabi Nona Ghazizadeh Seyed Arshan Dalili Ehsaneddin Asgari 68 1 0 19 Jan 2025
A Comprehensive Survey of Foundation Models in Medicine Wasif Khan Seowung Leem Kyle B. See Joshua K. Wong Shaoting Zhang R. Fang AI4CE LM&MA VLM 158 21 0 17 Jan 2025
WhiSPA: Semantically and Psychologically Aligned Whisper with Self-Supervised Contrastive and Student-Teacher Learning Rajath Rao Adithya Ganesan Oscar Kjell Jonah Luby Akshay Raghavan ... B. Luft Camilo Ruggero Neville Ryant R. Kotov H. Andrew Schwartz 62 0 0 15 Jan 2025
Attention Mechanisms Don't Learn Additive Models: Rethinking Feature Importance for Transformers Tobias Leemann Alina Fastowski Felix Pfeiffer Gjergji Kasneci 92 5 0 10 Jan 2025
LLM Reasoning Engine: Specialized Training for Enhanced Mathematical Reasoning Shuguang Chen Guang Lin LRM 346 0 0 28 Dec 2024
Unifying Feature-Based Explanations with Functional ANOVA and Cooperative Game Theory Fabian Fumagalli Maximilian Muschalik Eyke Hüllermeier Barbara Hammer J. Herbinger FAtt 92 2 0 22 Dec 2024
FedRLHF: A Convergence-Guaranteed Federated Framework for Privacy-Preserving and Personalized RLHF Flint Xiaofeng Fan Cheston Tan Yew-Soon Ong Roger Wattenhofer Wei Tsang Ooi 96 1 0 20 Dec 2024
Perception of Visual Content: Differences Between Humans and Foundation Models Nardiena A. Pratama Shaoyang Fan Gianluca Demartini VLM 114 0 0 28 Nov 2024
Multi-Label Bayesian Active Learning with Inter-Label Relationships Yuanyuan Qi Jueqing Lu Xiaohao Yang Joanne Enticott Lan Du 116 0 0 26 Nov 2024
LAGUNA: LAnguage Guided UNsupervised Adaptation with structured spaces Anxhelo Diko Antonino Furnari Luigi Cinque G. Farinella 236 0 0 23 Nov 2024
KinMo: Kinematic-aware Human Motion Understanding and Generation Pengfei Zhang Pinxin Liu Hyeongwoo Kim Pablo Garrido Bindita Chaudhuri 118 2 0 23 Nov 2024
FuseGPT: Learnable Layers Fusion of Generative Pre-trained Transformers Zehua Pei Hui-Ling Zhen Xianzhi Yu Sinno Jialin Pan Mingxuan Yuan Bei Yu AI4CE 134 3 0 21 Nov 2024
CorrSynth -- A Correlated Sampling Method for Diverse Dataset Generation from LLMs Suhas S Kowshik Abhishek Divekar Vijit Malik SyDa 80 0 0 13 Nov 2024