Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1910.10683
Cited By
v1
v2
v3
v4 (latest)
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
23 October 2019
Colin Raffel
Noam M. Shazeer
Adam Roberts
Katherine Lee
Sharan Narang
Michael Matena
Yanqi Zhou
Wei Li
Peter J. Liu
AIMat
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"
50 / 9,907 papers shown
Title
NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks
Yongchang Hao
Yanshuai Cao
Lili Mou
MQ
76
4
0
28 Oct 2024
EoRA: Fine-tuning-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation
Shih-yang Liu
Huck Yang
Nai Chit Fung
Charbel Sakr
Hongxu Yin
...
Jan Kautz
Yu-Chun Wang
Pavlo Molchanov
Min-Hung Chen
Min-Hung Chen
MQ
127
0
0
28 Oct 2024
Beyond Autoregression: Fast LLMs via Self-Distillation Through Time
Justin Deschenaux
Çağlar Gülçehre
131
5
0
28 Oct 2024
MrT5: Dynamic Token Merging for Efficient Byte-level Language Models
Julie Kallini
Shikhar Murty
Christopher D. Manning
Christopher Potts
Róbert Csordás
102
4
0
28 Oct 2024
MAMMAL -- Molecular Aligned Multi-Modal Architecture and Language
Yoel Shoshan
Moshiko Raboh
Michal Ozery-Flato
Vadim Ratner
Alex Golts
...
Sharon Kurant
Joseph A. Morrone
Parthasarathy Suryanarayanan
Michal Rosen-Zvi
Efrat Hexter
121
1
0
28 Oct 2024
David and Goliath: Small One-step Model Beats Large Diffusion with Score Post-training
Weijian Luo
C. Zhang
Debing Zhang
Zhengyang Geng
96
4
0
28 Oct 2024
ARLON: Boosting Diffusion Transformers with Autoregressive Models for Long Video Generation
Zongyi Li
Shujie Hu
Shujie Liu
Long Zhou
Jeongsoo Choi
Lingwei Meng
Xun Guo
Jiajian Li
H. Ling
Furu Wei
VGen
DiffM
154
7
0
27 Oct 2024
A Stack-Propagation Framework for Low-Resource Personalized Dialogue Generation
Haoyu Song
Weinan Zhang
Kaiyan Zhang
Ting Liu
67
3
0
26 Oct 2024
MatExpert: Decomposing Materials Discovery by Mimicking Human Experts
Qianggang Ding
Santiago Miret
Bang Liu
MoE
71
8
0
26 Oct 2024
Attacks against Abstractive Text Summarization Models through Lead Bias and Influence Functions
Poojitha Thota
Shirin Nilizadeh
69
2
0
26 Oct 2024
Chemical Language Model Linker: blending text and molecules with modular adapters
Yifan Deng
Spencer S. Ericksen
Anthony Gitter
158
2
0
26 Oct 2024
Layer by Layer: Uncovering Where Multi-Task Learning Happens in Instruction-Tuned Large Language Models
Zheng Zhao
Yftah Ziser
Shay B. Cohen
63
2
0
25 Oct 2024
Computational Bottlenecks of Training Small-scale Large Language Models
Saleh Ashkboos
Iman Mirzadeh
Keivan Alizadeh
Mohammad Hossein Sekhavat
Moin Nabi
Mehrdad Farajtabar
Fartash Faghri
61
1
0
25 Oct 2024
Ensembling Finetuned Language Models for Text Classification
Sebastian Pineda Arango
Maciej Janowski
Lennart Purucker
Arber Zela
Frank Hutter
Josif Grabocka
77
0
0
25 Oct 2024
Interleaving Text and Number Embeddings to Solve Mathemathics Problems
Marvin Alberts
Gianmarco Gabrieli
Irina Espejo Morales
51
2
0
25 Oct 2024
Two are better than one: Context window extension with multi-grained self-injection
Wei Han
Pan Zhou
Soujanya Poria
Shuicheng Yan
70
0
0
25 Oct 2024
Natural Language Processing for the Legal Domain: A Survey of Tasks, Datasets, Models, and Challenges
Farid Ariai
Gianluca Demartini
ELM
AILaw
VLM
88
7
0
25 Oct 2024
COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training
Haocheng Xi
Han Cai
Ligeng Zhu
Yaojie Lu
Kurt Keutzer
Jianfei Chen
Song Han
MQ
173
11
0
25 Oct 2024
Retrieving Implicit and Explicit Emotional Events Using Large Language Models
Guimin Hu
Hasti Seifi
100
1
0
24 Oct 2024
Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design
Ruisi Cai
Yeonju Ro
Geon-Woo Kim
Peihao Wang
Babak Ehteshami Bejnordi
Aditya Akella
Ziyi Wang
MoE
80
6
0
24 Oct 2024
RSA-Control: A Pragmatics-Grounded Lightweight Controllable Text Generation Framework
Yifan Wang
Vera Demberg
72
1
0
24 Oct 2024
TesseraQ: Ultra Low-Bit LLM Post-Training Quantization with Block Reconstruction
Yuhang Li
Priyadarshini Panda
MQ
73
1
0
24 Oct 2024
Deep Insights into Cognitive Decline: A Survey of Leveraging Non-Intrusive Modalities with Deep Learning Techniques
David Ortiz-Perez
Manuel Benavent-Lledo
José García Rodríguez
David Tomás
M. Flores Vizcaya-Moreno
69
1
0
24 Oct 2024
Dynamic Vocabulary Pruning in Early-Exit LLMs
Jort Vincenti
Karim Abdel Sadek
Joan Velja
Matteo Nulli
Metod Jazbec
55
0
0
24 Oct 2024
Are LLMs Better than Reported? Detecting Label Errors and Mitigating Their Effect on Model Performance
Omer Nahum
Nitay Calderon
Orgad Keller
Idan Szpektor
Roi Reichart
66
4
0
24 Oct 2024
Towards Visual Text Design Transfer Across Languages
Yejin Choi
Jiwan Chung
Sumin Shim
Giyeong Oh
Youngjae Yu
VLM
DiffM
67
1
0
24 Oct 2024
Smart ETL and LLM-based contents classification: the European Smart Tourism Tools Observatory experience
Diogo Cosme
António Galvão
Fernando Brito e Abreu
39
0
0
24 Oct 2024
Knowledge Distillation Using Frontier Open-source LLMs: Generalizability and the Role of Synthetic Data
Anup Shirgaonkar
Nikhil Pandey
Nazmiye Ceren Abay
Tolga Aktas
Vijay Aski
ALM
SyDa
65
1
0
24 Oct 2024
LOGO -- Long cOntext aliGnment via efficient preference Optimization
Zecheng Tang
Zechen Sun
Juntao Li
Qiaoming Zhu
Min Zhang
79
2
0
24 Oct 2024
The Nature of Mathematical Modeling and Probabilistic Optimization Engineering in Generative AI
Fulu Li
27
0
0
24 Oct 2024
Building Dialogue Understanding Models for Low-resource Language Indonesian from Scratch
Donglin Di
Weinan Zhang
Yue Zhang
Fanglin Wang
87
1
0
24 Oct 2024
Link, Synthesize, Retrieve: Universal Document Linking for Zero-Shot Information Retrieval
Dae Yon Hwang
Bilal Taha
Harshit Pande
Yaroslav Nechaev
SyDa
75
0
0
24 Oct 2024
Parameter-Efficient Fine-Tuning in Large Models: A Survey of Methodologies
Liwen Wang
Sheng Chen
Linnan Jiang
Shu Pan
Runze Cai
Sen Yang
Fei Yang
184
7
0
24 Oct 2024
Structure Language Models for Protein Conformation Generation
Jiarui Lu
Xiaoyin Chen
Stephen Zhewen Lu
Chence Shi
Hongyu Guo
Yoshua Bengio
Xiangbo Shu
DiffM
102
5
0
24 Oct 2024
Scaling up Masked Diffusion Models on Text
Shen Nie
Fengqi Zhu
Chao Du
Tianyu Pang
Qian Liu
Guangtao Zeng
Min Lin
Chongxuan Li
AI4CE
217
30
0
24 Oct 2024
LEGO: Language Model Building Blocks
Shrenik Bhansali
Alwin Jin
Tyler Lizzo
Larry Heck
31
0
0
23 Oct 2024
Key Algorithms for Keyphrase Generation: Instruction-Based LLMs for Russian Scientific Keyphrases
Anna Glazkova
Dmitry A. Morozov
Timur Garipov
90
0
0
23 Oct 2024
Scalable Ranked Preference Optimization for Text-to-Image Generation
Shyamgopal Karthik
Huseyin Coskun
Zeynep Akata
Sergey Tulyakov
J. Ren
Anil Kag
EGVM
111
9
0
23 Oct 2024
ExpertFlow: Optimized Expert Activation and Token Allocation for Efficient Mixture-of-Experts Inference
Xin He
Shunkang Zhang
Yuxin Wang
Haiyan Yin
Zihao Zeng
Shaohuai Shi
Zhenheng Tang
Xiaowen Chu
Ivor Tsang
Ong Yew Soon
MoE
102
7
0
23 Oct 2024
Future Token Prediction -- Causal Language Modelling with Per-Token Semantic State Vector for Multi-Token Prediction
Nicholas Walker
63
0
0
23 Oct 2024
Leveraging the Domain Adaptation of Retrieval Augmented Generation Models for Question Answering and Reducing Hallucination
Salman Rakin
Md. A. R. Shibly
Zahin M. Hossain
Zeeshan Khan
Md. Mostofa Akbar
73
3
0
23 Oct 2024
Responsible Multilingual Large Language Models: A Survey of Development, Applications, and Societal Impact
Junhua Liu
Bin Fu
LRM
44
1
0
23 Oct 2024
Is artificial intelligence still intelligence? LLMs generalize to novel adjective-noun pairs, but don't mimic the full human distribution
Hayley Ross
Kathryn Davidson
Najoung Kim
66
3
0
23 Oct 2024
ZIP-FIT: Embedding-Free Data Selection via Compression-Based Alignment
Elyas Obbad
Iddah Mlauzi
Alycia Lee
Rylan Schaeffer
Kamal Obbad
Suhana Bedi
Sanmi Koyejo
CVBM
146
0
0
23 Oct 2024
Beware of Calibration Data for Pruning Large Language Models
Yixin Ji
Yang Xiang
Juntao Li
Qingrong Xia
Ping Li
Xinyu Duan
Zhefeng Wang
Min Zhang
96
2
0
23 Oct 2024
Closed-form merging of parameter-efficient modules for Federated Continual Learning
Riccardo Salami
Pietro Buzzega
Matteo Mosconi
Jacopo Bonato
Luigi Sabetta
Simone Calderara
FedML
MoMe
CLL
111
4
0
23 Oct 2024
WAGLE: Strategic Weight Attribution for Effective and Modular Unlearning in Large Language Models
Jinghan Jia
Jiancheng Liu
Yihua Zhang
Parikshit Ram
Nathalie Baracaldo
Sijia Liu
MU
160
8
0
23 Oct 2024
Scalable Influence and Fact Tracing for Large Language Model Pretraining
Tyler A. Chang
Dheeraj Rajagopal
Tolga Bolukbasi
Lucas Dixon
Ian Tenney
TDI
94
5
0
22 Oct 2024
Captions Speak Louder than Images (CASLIE): Generalizing Foundation Models for E-commerce from High-quality Multimodal Instruction Data
Xinyi Ling
Bo Peng
Hanwen Du
Zhihui Zhu
Xia Ning
107
0
0
22 Oct 2024
From Attention to Activation: Unravelling the Enigmas of Large Language Models
Prannay Kaul
Chengcheng Ma
Ismail Elezi
Jiankang Deng
129
2
0
22 Oct 2024
Previous
1
2
3
...
30
31
32
...
197
198
199
Next