v1v2v3v4 (latest)

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

23 October 2019

Sharan Narang

Papers citing "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"

50 / 9,907 papers shown

Title
NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks Yongchang Hao Yanshuai Cao Lili Mou MQ 76 4 0 28 Oct 2024
EoRA: Fine-tuning-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation Shih-yang Liu Huck Yang Nai Chit Fung Charbel Sakr Hongxu Yin ... Jan Kautz Yu-Chun Wang Pavlo Molchanov Min-Hung Chen Min-Hung Chen MQ 127 0 0 28 Oct 2024
Beyond Autoregression: Fast LLMs via Self-Distillation Through Time Justin Deschenaux Çağlar Gülçehre 131 5 0 28 Oct 2024
MrT5: Dynamic Token Merging for Efficient Byte-level Language Models Julie Kallini Shikhar Murty Christopher D. Manning Christopher Potts Róbert Csordás 102 4 0 28 Oct 2024
MAMMAL -- Molecular Aligned Multi-Modal Architecture and Language Yoel Shoshan Moshiko Raboh Michal Ozery-Flato Vadim Ratner Alex Golts ... Sharon Kurant Joseph A. Morrone Parthasarathy Suryanarayanan Michal Rosen-Zvi Efrat Hexter 121 1 0 28 Oct 2024
David and Goliath: Small One-step Model Beats Large Diffusion with Score Post-training Weijian Luo C. Zhang Debing Zhang Zhengyang Geng 96 4 0 28 Oct 2024
ARLON: Boosting Diffusion Transformers with Autoregressive Models for Long Video Generation Zongyi Li Shujie Hu Shujie Liu Long Zhou Jeongsoo Choi Lingwei Meng Xun Guo Jiajian Li H. Ling Furu Wei VGen DiffM 154 7 0 27 Oct 2024
A Stack-Propagation Framework for Low-Resource Personalized Dialogue Generation Haoyu Song Weinan Zhang Kaiyan Zhang Ting Liu 67 3 0 26 Oct 2024
MatExpert: Decomposing Materials Discovery by Mimicking Human Experts Qianggang Ding Santiago Miret Bang Liu MoE 71 8 0 26 Oct 2024
Attacks against Abstractive Text Summarization Models through Lead Bias and Influence Functions Poojitha Thota Shirin Nilizadeh 69 2 0 26 Oct 2024
Chemical Language Model Linker: blending text and molecules with modular adapters Yifan Deng Spencer S. Ericksen Anthony Gitter 158 2 0 26 Oct 2024
Layer by Layer: Uncovering Where Multi-Task Learning Happens in Instruction-Tuned Large Language Models Zheng Zhao Yftah Ziser Shay B. Cohen 63 2 0 25 Oct 2024
Computational Bottlenecks of Training Small-scale Large Language Models Saleh Ashkboos Iman Mirzadeh Keivan Alizadeh Mohammad Hossein Sekhavat Moin Nabi Mehrdad Farajtabar Fartash Faghri 61 1 0 25 Oct 2024
Ensembling Finetuned Language Models for Text Classification Sebastian Pineda Arango Maciej Janowski Lennart Purucker Arber Zela Frank Hutter Josif Grabocka 77 0 0 25 Oct 2024
Interleaving Text and Number Embeddings to Solve Mathemathics Problems Marvin Alberts Gianmarco Gabrieli Irina Espejo Morales 51 2 0 25 Oct 2024
Two are better than one: Context window extension with multi-grained self-injection Wei Han Pan Zhou Soujanya Poria Shuicheng Yan 70 0 0 25 Oct 2024
Natural Language Processing for the Legal Domain: A Survey of Tasks, Datasets, Models, and Challenges Farid Ariai Gianluca Demartini ELM AILaw VLM 88 7 0 25 Oct 2024
COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training Haocheng Xi Han Cai Ligeng Zhu Yaojie Lu Kurt Keutzer Jianfei Chen Song Han MQ 173 11 0 25 Oct 2024
Retrieving Implicit and Explicit Emotional Events Using Large Language Models Guimin Hu Hasti Seifi 100 1 0 24 Oct 2024
Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design Ruisi Cai Yeonju Ro Geon-Woo Kim Peihao Wang Babak Ehteshami Bejnordi Aditya Akella Ziyi Wang MoE 80 6 0 24 Oct 2024
RSA-Control: A Pragmatics-Grounded Lightweight Controllable Text Generation Framework Yifan Wang Vera Demberg 72 1 0 24 Oct 2024
TesseraQ: Ultra Low-Bit LLM Post-Training Quantization with Block Reconstruction Yuhang Li Priyadarshini Panda MQ 73 1 0 24 Oct 2024
Deep Insights into Cognitive Decline: A Survey of Leveraging Non-Intrusive Modalities with Deep Learning Techniques David Ortiz-Perez Manuel Benavent-Lledo José García Rodríguez David Tomás M. Flores Vizcaya-Moreno 69 1 0 24 Oct 2024
Dynamic Vocabulary Pruning in Early-Exit LLMs Jort Vincenti Karim Abdel Sadek Joan Velja Matteo Nulli Metod Jazbec 55 0 0 24 Oct 2024
Are LLMs Better than Reported? Detecting Label Errors and Mitigating Their Effect on Model Performance Omer Nahum Nitay Calderon Orgad Keller Idan Szpektor Roi Reichart 66 4 0 24 Oct 2024
Towards Visual Text Design Transfer Across Languages Yejin Choi Jiwan Chung Sumin Shim Giyeong Oh Youngjae Yu VLM DiffM 67 1 0 24 Oct 2024
Smart ETL and LLM-based contents classification: the European Smart Tourism Tools Observatory experience Diogo Cosme António Galvão Fernando Brito e Abreu 39 0 0 24 Oct 2024
Knowledge Distillation Using Frontier Open-source LLMs: Generalizability and the Role of Synthetic Data Anup Shirgaonkar Nikhil Pandey Nazmiye Ceren Abay Tolga Aktas Vijay Aski ALM SyDa 65 1 0 24 Oct 2024
LOGO -- Long cOntext aliGnment via efficient preference Optimization Zecheng Tang Zechen Sun Juntao Li Qiaoming Zhu Min Zhang 79 2 0 24 Oct 2024
The Nature of Mathematical Modeling and Probabilistic Optimization Engineering in Generative AI Fulu Li 27 0 0 24 Oct 2024
Building Dialogue Understanding Models for Low-resource Language Indonesian from Scratch Donglin Di Weinan Zhang Yue Zhang Fanglin Wang 87 1 0 24 Oct 2024
Link, Synthesize, Retrieve: Universal Document Linking for Zero-Shot Information Retrieval Dae Yon Hwang Bilal Taha Harshit Pande Yaroslav Nechaev SyDa 75 0 0 24 Oct 2024
Parameter-Efficient Fine-Tuning in Large Models: A Survey of Methodologies Liwen Wang Sheng Chen Linnan Jiang Shu Pan Runze Cai Sen Yang Fei Yang 184 7 0 24 Oct 2024
Structure Language Models for Protein Conformation Generation Jiarui Lu Xiaoyin Chen Stephen Zhewen Lu Chence Shi Hongyu Guo Yoshua Bengio Xiangbo Shu DiffM 102 5 0 24 Oct 2024
Scaling up Masked Diffusion Models on Text Shen Nie Fengqi Zhu Chao Du Tianyu Pang Qian Liu Guangtao Zeng Min Lin Chongxuan Li AI4CE 217 30 0 24 Oct 2024
LEGO: Language Model Building Blocks Shrenik Bhansali Alwin Jin Tyler Lizzo Larry Heck 31 0 0 23 Oct 2024
Key Algorithms for Keyphrase Generation: Instruction-Based LLMs for Russian Scientific Keyphrases Anna Glazkova Dmitry A. Morozov Timur Garipov 90 0 0 23 Oct 2024
Scalable Ranked Preference Optimization for Text-to-Image Generation Shyamgopal Karthik Huseyin Coskun Zeynep Akata Sergey Tulyakov J. Ren Anil Kag EGVM 111 9 0 23 Oct 2024
ExpertFlow: Optimized Expert Activation and Token Allocation for Efficient Mixture-of-Experts Inference Xin He Shunkang Zhang Yuxin Wang Haiyan Yin Zihao Zeng Shaohuai Shi Zhenheng Tang Xiaowen Chu Ivor Tsang Ong Yew Soon MoE 102 7 0 23 Oct 2024
Future Token Prediction -- Causal Language Modelling with Per-Token Semantic State Vector for Multi-Token Prediction Nicholas Walker 63 0 0 23 Oct 2024
Leveraging the Domain Adaptation of Retrieval Augmented Generation Models for Question Answering and Reducing Hallucination Salman Rakin Md. A. R. Shibly Zahin M. Hossain Zeeshan Khan Md. Mostofa Akbar 73 3 0 23 Oct 2024
Responsible Multilingual Large Language Models: A Survey of Development, Applications, and Societal Impact Junhua Liu Bin Fu LRM 44 1 0 23 Oct 2024
Is artificial intelligence still intelligence? LLMs generalize to novel adjective-noun pairs, but don't mimic the full human distribution Hayley Ross Kathryn Davidson Najoung Kim 66 3 0 23 Oct 2024
ZIP-FIT: Embedding-Free Data Selection via Compression-Based Alignment Elyas Obbad Iddah Mlauzi Alycia Lee Rylan Schaeffer Kamal Obbad Suhana Bedi Sanmi Koyejo CVBM 146 0 0 23 Oct 2024
Beware of Calibration Data for Pruning Large Language Models Yixin Ji Yang Xiang Juntao Li Qingrong Xia Ping Li Xinyu Duan Zhefeng Wang Min Zhang 96 2 0 23 Oct 2024
Closed-form merging of parameter-efficient modules for Federated Continual Learning Riccardo Salami Pietro Buzzega Matteo Mosconi Jacopo Bonato Luigi Sabetta Simone Calderara FedML MoMe CLL 111 4 0 23 Oct 2024
WAGLE: Strategic Weight Attribution for Effective and Modular Unlearning in Large Language Models Jinghan Jia Jiancheng Liu Yihua Zhang Parikshit Ram Nathalie Baracaldo Sijia Liu MU 160 8 0 23 Oct 2024
Scalable Influence and Fact Tracing for Large Language Model Pretraining Tyler A. Chang Dheeraj Rajagopal Tolga Bolukbasi Lucas Dixon Ian Tenney TDI 94 5 0 22 Oct 2024
Captions Speak Louder than Images (CASLIE): Generalizing Foundation Models for E-commerce from High-quality Multimodal Instruction Data Xinyi Ling Bo Peng Hanwen Du Zhihui Zhu Xia Ning 107 0 0 22 Oct 2024
From Attention to Activation: Unravelling the Enigmas of Large Language Models Prannay Kaul Chengcheng Ma Ismail Elezi Jiankang Deng 129 2 0 22 Oct 2024