ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1910.10683
  4. Cited By
Exploring the Limits of Transfer Learning with a Unified Text-to-Text
  Transformer
v1v2v3v4 (latest)

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

23 October 2019
Colin Raffel
Noam M. Shazeer
Adam Roberts
Katherine Lee
Sharan Narang
Michael Matena
Yanqi Zhou
Wei Li
Peter J. Liu
    AIMat
ArXiv (abs)PDFHTML

Papers citing "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"

50 / 9,907 papers shown
Title
NeuZip: Memory-Efficient Training and Inference with Dynamic Compression
  of Neural Networks
NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks
Yongchang Hao
Yanshuai Cao
Lili Mou
MQ
76
4
0
28 Oct 2024
EoRA: Fine-tuning-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation
EoRA: Fine-tuning-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation
Shih-yang Liu
Huck Yang
Nai Chit Fung
Charbel Sakr
Hongxu Yin
...
Jan Kautz
Yu-Chun Wang
Pavlo Molchanov
Min-Hung Chen
Min-Hung Chen
MQ
127
0
0
28 Oct 2024
Beyond Autoregression: Fast LLMs via Self-Distillation Through Time
Beyond Autoregression: Fast LLMs via Self-Distillation Through Time
Justin Deschenaux
Çağlar Gülçehre
131
5
0
28 Oct 2024
MrT5: Dynamic Token Merging for Efficient Byte-level Language Models
MrT5: Dynamic Token Merging for Efficient Byte-level Language Models
Julie Kallini
Shikhar Murty
Christopher D. Manning
Christopher Potts
Róbert Csordás
102
4
0
28 Oct 2024
MAMMAL -- Molecular Aligned Multi-Modal Architecture and Language
MAMMAL -- Molecular Aligned Multi-Modal Architecture and Language
Yoel Shoshan
Moshiko Raboh
Michal Ozery-Flato
Vadim Ratner
Alex Golts
...
Sharon Kurant
Joseph A. Morrone
Parthasarathy Suryanarayanan
Michal Rosen-Zvi
Efrat Hexter
121
1
0
28 Oct 2024
David and Goliath: Small One-step Model Beats Large Diffusion with Score Post-training
David and Goliath: Small One-step Model Beats Large Diffusion with Score Post-training
Weijian Luo
C. Zhang
Debing Zhang
Zhengyang Geng
96
4
0
28 Oct 2024
ARLON: Boosting Diffusion Transformers with Autoregressive Models for Long Video Generation
ARLON: Boosting Diffusion Transformers with Autoregressive Models for Long Video Generation
Zongyi Li
Shujie Hu
Shujie Liu
Long Zhou
Jeongsoo Choi
Lingwei Meng
Xun Guo
Jiajian Li
H. Ling
Furu Wei
VGenDiffM
154
7
0
27 Oct 2024
A Stack-Propagation Framework for Low-Resource Personalized Dialogue
  Generation
A Stack-Propagation Framework for Low-Resource Personalized Dialogue Generation
Haoyu Song
Weinan Zhang
Kaiyan Zhang
Ting Liu
67
3
0
26 Oct 2024
MatExpert: Decomposing Materials Discovery by Mimicking Human Experts
MatExpert: Decomposing Materials Discovery by Mimicking Human Experts
Qianggang Ding
Santiago Miret
Bang Liu
MoE
71
8
0
26 Oct 2024
Attacks against Abstractive Text Summarization Models through Lead Bias
  and Influence Functions
Attacks against Abstractive Text Summarization Models through Lead Bias and Influence Functions
Poojitha Thota
Shirin Nilizadeh
69
2
0
26 Oct 2024
Chemical Language Model Linker: blending text and molecules with modular adapters
Chemical Language Model Linker: blending text and molecules with modular adapters
Yifan Deng
Spencer S. Ericksen
Anthony Gitter
158
2
0
26 Oct 2024
Layer by Layer: Uncovering Where Multi-Task Learning Happens in
  Instruction-Tuned Large Language Models
Layer by Layer: Uncovering Where Multi-Task Learning Happens in Instruction-Tuned Large Language Models
Zheng Zhao
Yftah Ziser
Shay B. Cohen
63
2
0
25 Oct 2024
Computational Bottlenecks of Training Small-scale Large Language Models
Computational Bottlenecks of Training Small-scale Large Language Models
Saleh Ashkboos
Iman Mirzadeh
Keivan Alizadeh
Mohammad Hossein Sekhavat
Moin Nabi
Mehrdad Farajtabar
Fartash Faghri
61
1
0
25 Oct 2024
Ensembling Finetuned Language Models for Text Classification
Ensembling Finetuned Language Models for Text Classification
Sebastian Pineda Arango
Maciej Janowski
Lennart Purucker
Arber Zela
Frank Hutter
Josif Grabocka
77
0
0
25 Oct 2024
Interleaving Text and Number Embeddings to Solve Mathemathics Problems
Interleaving Text and Number Embeddings to Solve Mathemathics Problems
Marvin Alberts
Gianmarco Gabrieli
Irina Espejo Morales
51
2
0
25 Oct 2024
Two are better than one: Context window extension with multi-grained
  self-injection
Two are better than one: Context window extension with multi-grained self-injection
Wei Han
Pan Zhou
Soujanya Poria
Shuicheng Yan
70
0
0
25 Oct 2024
Natural Language Processing for the Legal Domain: A Survey of Tasks, Datasets, Models, and Challenges
Natural Language Processing for the Legal Domain: A Survey of Tasks, Datasets, Models, and Challenges
Farid Ariai
Gianluca Demartini
ELMAILawVLM
88
7
0
25 Oct 2024
COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training
COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training
Haocheng Xi
Han Cai
Ligeng Zhu
Yaojie Lu
Kurt Keutzer
Jianfei Chen
Song Han
MQ
173
11
0
25 Oct 2024
Retrieving Implicit and Explicit Emotional Events Using Large Language
  Models
Retrieving Implicit and Explicit Emotional Events Using Large Language Models
Guimin Hu
Hasti Seifi
100
1
0
24 Oct 2024
Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with
  System Co-Design
Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design
Ruisi Cai
Yeonju Ro
Geon-Woo Kim
Peihao Wang
Babak Ehteshami Bejnordi
Aditya Akella
Ziyi Wang
MoE
80
6
0
24 Oct 2024
RSA-Control: A Pragmatics-Grounded Lightweight Controllable Text
  Generation Framework
RSA-Control: A Pragmatics-Grounded Lightweight Controllable Text Generation Framework
Yifan Wang
Vera Demberg
72
1
0
24 Oct 2024
TesseraQ: Ultra Low-Bit LLM Post-Training Quantization with Block
  Reconstruction
TesseraQ: Ultra Low-Bit LLM Post-Training Quantization with Block Reconstruction
Yuhang Li
Priyadarshini Panda
MQ
73
1
0
24 Oct 2024
Deep Insights into Cognitive Decline: A Survey of Leveraging
  Non-Intrusive Modalities with Deep Learning Techniques
Deep Insights into Cognitive Decline: A Survey of Leveraging Non-Intrusive Modalities with Deep Learning Techniques
David Ortiz-Perez
Manuel Benavent-Lledo
José García Rodríguez
David Tomás
M. Flores Vizcaya-Moreno
69
1
0
24 Oct 2024
Dynamic Vocabulary Pruning in Early-Exit LLMs
Dynamic Vocabulary Pruning in Early-Exit LLMs
Jort Vincenti
Karim Abdel Sadek
Joan Velja
Matteo Nulli
Metod Jazbec
55
0
0
24 Oct 2024
Are LLMs Better than Reported? Detecting Label Errors and Mitigating
  Their Effect on Model Performance
Are LLMs Better than Reported? Detecting Label Errors and Mitigating Their Effect on Model Performance
Omer Nahum
Nitay Calderon
Orgad Keller
Idan Szpektor
Roi Reichart
66
4
0
24 Oct 2024
Towards Visual Text Design Transfer Across Languages
Towards Visual Text Design Transfer Across Languages
Yejin Choi
Jiwan Chung
Sumin Shim
Giyeong Oh
Youngjae Yu
VLMDiffM
67
1
0
24 Oct 2024
Smart ETL and LLM-based contents classification: the European Smart
  Tourism Tools Observatory experience
Smart ETL and LLM-based contents classification: the European Smart Tourism Tools Observatory experience
Diogo Cosme
António Galvão
Fernando Brito e Abreu
39
0
0
24 Oct 2024
Knowledge Distillation Using Frontier Open-source LLMs: Generalizability
  and the Role of Synthetic Data
Knowledge Distillation Using Frontier Open-source LLMs: Generalizability and the Role of Synthetic Data
Anup Shirgaonkar
Nikhil Pandey
Nazmiye Ceren Abay
Tolga Aktas
Vijay Aski
ALMSyDa
65
1
0
24 Oct 2024
LOGO -- Long cOntext aliGnment via efficient preference Optimization
LOGO -- Long cOntext aliGnment via efficient preference Optimization
Zecheng Tang
Zechen Sun
Juntao Li
Qiaoming Zhu
Min Zhang
79
2
0
24 Oct 2024
The Nature of Mathematical Modeling and Probabilistic Optimization
  Engineering in Generative AI
The Nature of Mathematical Modeling and Probabilistic Optimization Engineering in Generative AI
Fulu Li
27
0
0
24 Oct 2024
Building Dialogue Understanding Models for Low-resource Language
  Indonesian from Scratch
Building Dialogue Understanding Models for Low-resource Language Indonesian from Scratch
Donglin Di
Weinan Zhang
Yue Zhang
Fanglin Wang
87
1
0
24 Oct 2024
Link, Synthesize, Retrieve: Universal Document Linking for Zero-Shot
  Information Retrieval
Link, Synthesize, Retrieve: Universal Document Linking for Zero-Shot Information Retrieval
Dae Yon Hwang
Bilal Taha
Harshit Pande
Yaroslav Nechaev
SyDa
75
0
0
24 Oct 2024
Parameter-Efficient Fine-Tuning in Large Models: A Survey of Methodologies
Parameter-Efficient Fine-Tuning in Large Models: A Survey of Methodologies
Liwen Wang
Sheng Chen
Linnan Jiang
Shu Pan
Runze Cai
Sen Yang
Fei Yang
184
7
0
24 Oct 2024
Structure Language Models for Protein Conformation Generation
Structure Language Models for Protein Conformation Generation
Jiarui Lu
Xiaoyin Chen
Stephen Zhewen Lu
Chence Shi
Hongyu Guo
Yoshua Bengio
Xiangbo Shu
DiffM
102
5
0
24 Oct 2024
Scaling up Masked Diffusion Models on Text
Scaling up Masked Diffusion Models on Text
Shen Nie
Fengqi Zhu
Chao Du
Tianyu Pang
Qian Liu
Guangtao Zeng
Min Lin
Chongxuan Li
AI4CE
217
30
0
24 Oct 2024
LEGO: Language Model Building Blocks
LEGO: Language Model Building Blocks
Shrenik Bhansali
Alwin Jin
Tyler Lizzo
Larry Heck
31
0
0
23 Oct 2024
Key Algorithms for Keyphrase Generation: Instruction-Based LLMs for
  Russian Scientific Keyphrases
Key Algorithms for Keyphrase Generation: Instruction-Based LLMs for Russian Scientific Keyphrases
Anna Glazkova
Dmitry A. Morozov
Timur Garipov
90
0
0
23 Oct 2024
Scalable Ranked Preference Optimization for Text-to-Image Generation
Scalable Ranked Preference Optimization for Text-to-Image Generation
Shyamgopal Karthik
Huseyin Coskun
Zeynep Akata
Sergey Tulyakov
J. Ren
Anil Kag
EGVM
111
9
0
23 Oct 2024
ExpertFlow: Optimized Expert Activation and Token Allocation for
  Efficient Mixture-of-Experts Inference
ExpertFlow: Optimized Expert Activation and Token Allocation for Efficient Mixture-of-Experts Inference
Xin He
Shunkang Zhang
Yuxin Wang
Haiyan Yin
Zihao Zeng
Shaohuai Shi
Zhenheng Tang
Xiaowen Chu
Ivor Tsang
Ong Yew Soon
MoE
102
7
0
23 Oct 2024
Future Token Prediction -- Causal Language Modelling with Per-Token
  Semantic State Vector for Multi-Token Prediction
Future Token Prediction -- Causal Language Modelling with Per-Token Semantic State Vector for Multi-Token Prediction
Nicholas Walker
63
0
0
23 Oct 2024
Leveraging the Domain Adaptation of Retrieval Augmented Generation
  Models for Question Answering and Reducing Hallucination
Leveraging the Domain Adaptation of Retrieval Augmented Generation Models for Question Answering and Reducing Hallucination
Salman Rakin
Md. A. R. Shibly
Zahin M. Hossain
Zeeshan Khan
Md. Mostofa Akbar
73
3
0
23 Oct 2024
Responsible Multilingual Large Language Models: A Survey of Development,
  Applications, and Societal Impact
Responsible Multilingual Large Language Models: A Survey of Development, Applications, and Societal Impact
Junhua Liu
Bin Fu
LRM
44
1
0
23 Oct 2024
Is artificial intelligence still intelligence? LLMs generalize to novel
  adjective-noun pairs, but don't mimic the full human distribution
Is artificial intelligence still intelligence? LLMs generalize to novel adjective-noun pairs, but don't mimic the full human distribution
Hayley Ross
Kathryn Davidson
Najoung Kim
66
3
0
23 Oct 2024
ZIP-FIT: Embedding-Free Data Selection via Compression-Based Alignment
ZIP-FIT: Embedding-Free Data Selection via Compression-Based Alignment
Elyas Obbad
Iddah Mlauzi
Alycia Lee
Rylan Schaeffer
Kamal Obbad
Suhana Bedi
Sanmi Koyejo
CVBM
146
0
0
23 Oct 2024
Beware of Calibration Data for Pruning Large Language Models
Beware of Calibration Data for Pruning Large Language Models
Yixin Ji
Yang Xiang
Juntao Li
Qingrong Xia
Ping Li
Xinyu Duan
Zhefeng Wang
Min Zhang
96
2
0
23 Oct 2024
Closed-form merging of parameter-efficient modules for Federated Continual Learning
Closed-form merging of parameter-efficient modules for Federated Continual Learning
Riccardo Salami
Pietro Buzzega
Matteo Mosconi
Jacopo Bonato
Luigi Sabetta
Simone Calderara
FedMLMoMeCLL
111
4
0
23 Oct 2024
WAGLE: Strategic Weight Attribution for Effective and Modular Unlearning in Large Language Models
WAGLE: Strategic Weight Attribution for Effective and Modular Unlearning in Large Language Models
Jinghan Jia
Jiancheng Liu
Yihua Zhang
Parikshit Ram
Nathalie Baracaldo
Sijia Liu
MU
160
8
0
23 Oct 2024
Scalable Influence and Fact Tracing for Large Language Model Pretraining
Scalable Influence and Fact Tracing for Large Language Model Pretraining
Tyler A. Chang
Dheeraj Rajagopal
Tolga Bolukbasi
Lucas Dixon
Ian Tenney
TDI
94
5
0
22 Oct 2024
Captions Speak Louder than Images (CASLIE): Generalizing Foundation
  Models for E-commerce from High-quality Multimodal Instruction Data
Captions Speak Louder than Images (CASLIE): Generalizing Foundation Models for E-commerce from High-quality Multimodal Instruction Data
Xinyi Ling
Bo Peng
Hanwen Du
Zhihui Zhu
Xia Ning
107
0
0
22 Oct 2024
From Attention to Activation: Unravelling the Enigmas of Large Language
  Models
From Attention to Activation: Unravelling the Enigmas of Large Language Models
Prannay Kaul
Chengcheng Ma
Ismail Elezi
Jiankang Deng
129
2
0
22 Oct 2024
Previous
123...303132...197198199
Next