ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1910.10683
  4. Cited By
Exploring the Limits of Transfer Learning with a Unified Text-to-Text
  Transformer

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

23 October 2019
Colin Raffel
Noam M. Shazeer
Adam Roberts
Katherine Lee
Sharan Narang
Michael Matena
Yanqi Zhou
Wei Li
Peter J. Liu
    AIMat
ArXivPDFHTML

Papers citing "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"

50 / 8,796 papers shown
Title
Data Movement Is All You Need: A Case Study on Optimizing Transformers
Data Movement Is All You Need: A Case Study on Optimizing Transformers
A. Ivanov
Nikoli Dryden
Tal Ben-Nun
Shigang Li
Torsten Hoefler
36
131
0
30 Jun 2020
GShard: Scaling Giant Models with Conditional Computation and Automatic
  Sharding
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
Dmitry Lepikhin
HyoukJoong Lee
Yuanzhong Xu
Dehao Chen
Orhan Firat
Yanping Huang
M. Krikun
Noam M. Shazeer
Z. Chen
MoE
43
1,108
0
30 Jun 2020
Composed Fine-Tuning: Freezing Pre-Trained Denoising Autoencoders for
  Improved Generalization
Composed Fine-Tuning: Freezing Pre-Trained Denoising Autoencoders for Improved Generalization
Sang Michael Xie
Tengyu Ma
Percy Liang
35
13
0
29 Jun 2020
Answering Questions on COVID-19 in Real-Time
Answering Questions on COVID-19 in Real-Time
Jinhyuk Lee
Sean S. Yi
Minbyul Jeong
Mujeen Sung
Wonjin Yoon
Yonghwa Choi
Miyoung Ko
Jaewoo Kang
18
43
0
29 Jun 2020
The Depth-to-Width Interplay in Self-Attention
The Depth-to-Width Interplay in Self-Attention
Yoav Levine
Noam Wies
Or Sharir
Hofit Bata
Amnon Shashua
30
45
0
22 Jun 2020
MaxVA: Fast Adaptation of Step Sizes by Maximizing Observed Variance of
  Gradients
MaxVA: Fast Adaptation of Step Sizes by Maximizing Observed Variance of Gradients
Chenfei Zhu
Yu Cheng
Zhe Gan
Furong Huang
Jingjing Liu
Tom Goldstein
ODL
35
2
0
21 Jun 2020
Cross-lingual Retrieval for Iterative Self-Supervised Training
Cross-lingual Retrieval for Iterative Self-Supervised Training
C. Tran
Y. Tang
Xian Li
Jiatao Gu
RALM
28
72
0
16 Jun 2020
To Pretrain or Not to Pretrain: Examining the Benefits of Pretraining on
  Resource Rich Tasks
To Pretrain or Not to Pretrain: Examining the Benefits of Pretraining on Resource Rich Tasks
Sinong Wang
Madian Khabsa
Hao Ma
18
26
0
15 Jun 2020
A Monolingual Approach to Contextualized Word Embeddings for
  Mid-Resource Languages
A Monolingual Approach to Contextualized Word Embeddings for Mid-Resource Languages
Pedro Ortiz Suarez
Laurent Romary
Benoît Sagot
25
227
0
11 Jun 2020
Linformer: Self-Attention with Linear Complexity
Linformer: Self-Attention with Linear Complexity
Sinong Wang
Belinda Z. Li
Madian Khabsa
Han Fang
Hao Ma
63
1,647
0
08 Jun 2020
CycleGT: Unsupervised Graph-to-Text and Text-to-Graph Generation via
  Cycle Training
CycleGT: Unsupervised Graph-to-Text and Text-to-Graph Generation via Cycle Training
Qipeng Guo
Zhijing Jin
Xipeng Qiu
Weinan Zhang
David Wipf
Zheng-Wei Zhang
54
60
0
08 Jun 2020
DeBERTa: Decoding-enhanced BERT with Disentangled Attention
DeBERTa: Decoding-enhanced BERT with Disentangled Attention
Pengcheng He
Xiaodong Liu
Jianfeng Gao
Weizhu Chen
AAML
64
2,622
0
05 Jun 2020
Funnel-Transformer: Filtering out Sequential Redundancy for Efficient
  Language Processing
Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing
Zihang Dai
Guokun Lai
Yiming Yang
Quoc V. Le
48
229
0
05 Jun 2020
A Survey on Transfer Learning in Natural Language Processing
A Survey on Transfer Learning in Natural Language Processing
Zaid Alyafeai
Maged S. Alshaibani
Irfan Ahmad
30
72
0
31 May 2020
Language Models are Few-Shot Learners
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
56
40,200
0
28 May 2020
Syntactic Structure Distillation Pretraining For Bidirectional Encoders
Syntactic Structure Distillation Pretraining For Bidirectional Encoders
A. Kuncoro
Lingpeng Kong
Daniel Fried
Dani Yogatama
Laura Rimell
Chris Dyer
Phil Blunsom
31
33
0
27 May 2020
Predict-then-Decide: A Predictive Approach for Wait or Answer Task in
  Dialogue Systems
Predict-then-Decide: A Predictive Approach for Wait or Answer Task in Dialogue Systems
Zehao Lin
Shaobo Cui
Guodun Li
Xiaoming Kang
Feng Ji
Feng-Lin Li
Zhongzhou Zhao
Haiqing Chen
Yin Zhang
34
1
0
27 May 2020
NILE : Natural Language Inference with Faithful Natural Language
  Explanations
NILE : Natural Language Inference with Faithful Natural Language Explanations
Sawan Kumar
Partha P. Talukdar
XAI
LRM
13
159
0
25 May 2020
Summarizing and Exploring Tabular Data in Conversational Search
Summarizing and Exploring Tabular Data in Conversational Search
Shuo Zhang
Zhuyun Dai
K. Balog
Jamie Callan
RALM
LMTD
18
39
0
23 May 2020
Text-to-Text Pre-Training for Data-to-Text Tasks
Text-to-Text Pre-Training for Data-to-Text Tasks
Mihir Kale
Abhinav Rastogi
AI4CE
13
200
0
21 May 2020
BiQGEMM: Matrix Multiplication with Lookup Table For Binary-Coding-based
  Quantized DNNs
BiQGEMM: Matrix Multiplication with Lookup Table For Binary-Coding-based Quantized DNNs
Yongkweon Jeon
Baeseong Park
S. Kwon
Byeongwook Kim
Jeongin Yun
Dongsoo Lee
MQ
33
30
0
20 May 2020
Normalized Attention Without Probability Cage
Normalized Attention Without Probability Cage
Oliver Richter
Roger Wattenhofer
14
21
0
19 May 2020
Movement Pruning: Adaptive Sparsity by Fine-Tuning
Movement Pruning: Adaptive Sparsity by Fine-Tuning
Victor Sanh
Thomas Wolf
Alexander M. Rush
32
468
0
15 May 2020
Large Scale Multi-Actor Generative Dialog Modeling
Large Scale Multi-Actor Generative Dialog Modeling
Alex Boyd
Raul Puri
M. Shoeybi
M. Patwary
Bryan Catanzaro
19
23
0
13 May 2020
Multi-Stage Conversational Passage Retrieval: An Approach to Fusing Term
  Importance Estimation and Neural Query Rewriting
Multi-Stage Conversational Passage Retrieval: An Approach to Fusing Term Importance Estimation and Neural Query Rewriting
Sheng-Chieh Lin
Jheng-Hong Yang
Rodrigo Nogueira
Ming-Feng Tsai
Chuan-Ju Wang
Jimmy J. Lin
24
24
0
05 May 2020
Teaching Machine Comprehension with Compositional Explanations
Teaching Machine Comprehension with Compositional Explanations
Qinyuan Ye
Xiao Huang
Elizabeth Boschee
Xiang Ren
LRM
ReLM
19
34
0
02 May 2020
ForecastQA: A Question Answering Challenge for Event Forecasting with
  Temporal Text Data
ForecastQA: A Question Answering Challenge for Event Forecasting with Temporal Text Data
Woojeong Jin
Rahul Khanna
Suji Kim
Dong-Ho Lee
Fred Morstatter
Aram Galstyan
Xiang Ren
AI4TS
11
36
0
02 May 2020
UnifiedQA: Crossing Format Boundaries With a Single QA System
UnifiedQA: Crossing Format Boundaries With a Single QA System
Daniel Khashabi
Sewon Min
Tushar Khot
Ashish Sabharwal
Oyvind Tafjord
Peter Clark
Hannaneh Hajishirzi
35
720
0
02 May 2020
Probing Contextual Language Models for Common Ground with Visual
  Representations
Probing Contextual Language Models for Common Ground with Visual Representations
Gabriel Ilharco
Rowan Zellers
Ali Farhadi
Hannaneh Hajishirzi
30
14
0
01 May 2020
Beneath the Tip of the Iceberg: Current Challenges and New Directions in
  Sentiment Analysis Research
Beneath the Tip of the Iceberg: Current Challenges and New Directions in Sentiment Analysis Research
Soujanya Poria
Devamanyu Hazarika
Navonil Majumder
Rada Mihalcea
42
207
0
01 May 2020
HERO: Hierarchical Encoder for Video+Language Omni-representation
  Pre-training
HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training
Linjie Li
Yen-Chun Chen
Yu Cheng
Zhe Gan
Licheng Yu
Jingjing Liu
MLLM
VLM
OffRL
AI4TS
43
493
0
01 May 2020
Recipes for Adapting Pre-trained Monolingual and Multilingual Models to
  Machine Translation
Recipes for Adapting Pre-trained Monolingual and Multilingual Models to Machine Translation
Asa Cooper Stickland
Xian Li
Marjan Ghazvininejad
36
44
0
30 Apr 2020
Mind Your Inflections! Improving NLP for Non-Standard Englishes with
  Base-Inflection Encoding
Mind Your Inflections! Improving NLP for Non-Standard Englishes with Base-Inflection Encoding
Samson Tan
Chenyu You
L. Varshney
Min-Yen Kan
17
34
0
30 Apr 2020
Pre-training Is (Almost) All You Need: An Application to Commonsense
  Reasoning
Pre-training Is (Almost) All You Need: An Application to Commonsense Reasoning
Alexandre Tamborrino
Nicola Pellicanò
B. Pannier
Pascal Voitot
Louise Naudin
LRM
16
62
0
29 Apr 2020
Multi-Task Learning for Dense Prediction Tasks: A Survey
Multi-Task Learning for Dense Prediction Tasks: A Survey
Simon Vandenhende
Stamatios Georgoulis
Wouter Van Gansbeke
Marc Proesmans
Dengxin Dai
Luc Van Gool
CVBM
29
72
0
28 Apr 2020
Syntactic Data Augmentation Increases Robustness to Inference Heuristics
Syntactic Data Augmentation Increases Robustness to Inference Heuristics
Junghyun Min
R. Thomas McCoy
Dipanjan Das
Emily Pitler
Tal Linzen
30
175
0
24 Apr 2020
Generative Data Augmentation for Commonsense Reasoning
Generative Data Augmentation for Commonsense Reasoning
Yiben Yang
Chaitanya Malaviya
Jared Fernandez
Swabha Swayamdipta
Ronan Le Bras
Ji-ping Wang
Chandra Bhagavatula
Yejin Choi
Doug Downey
LRM
27
91
0
24 Apr 2020
A Review of Winograd Schema Challenge Datasets and Approaches
A Review of Winograd Schema Challenge Datasets and Approaches
Vid Kocijan
Thomas Lukasiewicz
E. Davis
G. Marcus
L. Morgenstern
25
43
0
23 Apr 2020
MT-Clinical BERT: Scaling Clinical Information Extraction with Multitask
  Learning
MT-Clinical BERT: Scaling Clinical Information Extraction with Multitask Learning
Andriy Mulyar
Bridget T. McInnes
13
52
0
21 Apr 2020
Fine-tuning Multi-hop Question Answering with Hierarchical Graph Network
Guanming Xiong
26
0
0
20 Apr 2020
The Cost of Training NLP Models: A Concise Overview
The Cost of Training NLP Models: A Concise Overview
Or Sharir
Barak Peleg
Y. Shoham
40
210
0
19 Apr 2020
Can You Put it All Together: Evaluating Conversational Agents' Ability
  to Blend Skills
Can You Put it All Together: Evaluating Conversational Agents' Ability to Blend Skills
Eric Michael Smith
Mary Williamson
Kurt Shuster
Jason Weston
Y-Lan Boureau
19
221
0
17 Apr 2020
The Right Tool for the Job: Matching Model and Instance Complexities
The Right Tool for the Job: Matching Model and Instance Complexities
Roy Schwartz
Gabriel Stanovsky
Swabha Swayamdipta
Jesse Dodge
Noah A. Smith
38
167
0
16 Apr 2020
CrisisBench: Benchmarking Crisis-related Social Media Datasets for
  Humanitarian Information Processing
CrisisBench: Benchmarking Crisis-related Social Media Datasets for Humanitarian Information Processing
Firoj Alam
Hassan Sajjad
Muhammad Imran
Ferda Ofli
18
14
0
14 Apr 2020
CLUE: A Chinese Language Understanding Evaluation Benchmark
CLUE: A Chinese Language Understanding Evaluation Benchmark
Liang Xu
Hai Hu
Xuanwei Zhang
Lu Li
Chenjie Cao
...
Cong Yue
Xinrui Zhang
Zhen-Yi Yang
Kyle Richardson
Zhenzhong Lan
ELM
45
377
0
13 Apr 2020
Unsupervised Commonsense Question Answering with Self-Talk
Unsupervised Commonsense Question Answering with Self-Talk
Vered Shwartz
Peter West
Ronan Le Bras
Chandra Bhagavatula
Yejin Choi
ReLM
SSL
AI4MH
LRM
19
257
0
11 Apr 2020
Longformer: The Long-Document Transformer
Longformer: The Long-Document Transformer
Iz Beltagy
Matthew E. Peters
Arman Cohan
RALM
VLM
28
3,929
0
10 Apr 2020
Exploring Versatile Generative Language Model Via Parameter-Efficient
  Transfer Learning
Exploring Versatile Generative Language Model Via Parameter-Efficient Transfer Learning
Zhaojiang Lin
Andrea Madotto
Pascale Fung
34
155
0
08 Apr 2020
Downstream Model Design of Pre-trained Language Model for Relation
  Extraction Task
Downstream Model Design of Pre-trained Language Model for Relation Extraction Task
Cheng-rong Li
Ye Tian
19
36
0
08 Apr 2020
Byte Pair Encoding is Suboptimal for Language Model Pretraining
Byte Pair Encoding is Suboptimal for Language Model Pretraining
Kaj Bostrom
Greg Durrett
14
200
0
07 Apr 2020
Previous
123...174175176
Next