ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2212.08153
  4. Cited By
FiDO: Fusion-in-Decoder optimized for stronger performance and faster
  inference

FiDO: Fusion-in-Decoder optimized for stronger performance and faster inference

15 December 2022
Michiel de Jong
Yury Zemlyanskiy
Joshua Ainslie
Nicholas FitzGerald
Sumit Sanghai
Fei Sha
William W. Cohen
    VLM
ArXivPDFHTML

Papers citing "FiDO: Fusion-in-Decoder optimized for stronger performance and faster inference"

33 / 33 papers shown
Title
Yi: Open Foundation Models by 01.AI
Yi: Open Foundation Models by 01.AI
01. AI
Alex Young
01.AI Alex Young
Bei Chen
Chao Li
...
Yue Wang
Yuxuan Cai
Zhenyu Gu
Zhiyuan Liu
Zonghong Dai
OSLM
LRM
233
550
0
07 Mar 2024
Can Open-Domain QA Reader Utilize External Knowledge Efficiently like
  Humans?
Can Open-Domain QA Reader Utilize External Knowledge Efficiently like Humans?
Neeraj Varshney
Man Luo
Chitta Baral
RALM
38
12
0
23 Nov 2022
Efficiently Scaling Transformer Inference
Efficiently Scaling Transformer Inference
Reiner Pope
Sholto Douglas
Aakanksha Chowdhery
Jacob Devlin
James Bradbury
Anselm Levskaya
Jonathan Heek
Kefan Xiao
Shivani Agrawal
J. Dean
75
321
0
09 Nov 2022
Decoupled Context Processing for Context Augmented Language Modeling
Decoupled Context Processing for Context Augmented Language Modeling
Zonglin Li
Ruiqi Guo
Surinder Kumar
RALM
KELM
41
24
0
11 Oct 2022
GLM-130B: An Open Bilingual Pre-trained Model
GLM-130B: An Open Bilingual Pre-trained Model
Aohan Zeng
Xiao Liu
Zhengxiao Du
Zihan Wang
Hanyu Lai
...
Jidong Zhai
Wenguang Chen
Peng Zhang
Yuxiao Dong
Jie Tang
BDL
LRM
342
1,090
0
05 Oct 2022
FiD-Light: Efficient and Effective Retrieval-Augmented Text Generation
FiD-Light: Efficient and Effective Retrieval-Augmented Text Generation
Sebastian Hofstatter
Jiecao Chen
K. Raman
Hamed Zamani
RALM
75
81
0
28 Sep 2022
Generate rather than Retrieve: Large Language Models are Strong Context
  Generators
Generate rather than Retrieve: Large Language Models are Strong Context Generators
Wenhao Yu
Dan Iter
Shuohang Wang
Yichong Xu
Mingxuan Ju
Soumya Sanyal
Chenguang Zhu
Michael Zeng
Meng Jiang
RALM
AIMat
300
335
0
21 Sep 2022
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
Tim Dettmers
M. Lewis
Younes Belkada
Luke Zettlemoyer
MQ
76
649
0
15 Aug 2022
FlashAttention: Fast and Memory-Efficient Exact Attention with
  IO-Awareness
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
Tri Dao
Daniel Y. Fu
Stefano Ermon
Atri Rudra
Christopher Ré
VLM
193
2,200
0
27 May 2022
Scaling Up Models and Data with $\texttt{t5x}$ and $\texttt{seqio}$
Scaling Up Models and Data with t5x\texttt{t5x}t5x and seqio\texttt{seqio}seqio
Adam Roberts
Hyung Won Chung
Anselm Levskaya
Gaurav Mishra
James Bradbury
...
Brennan Saeta
Ryan Sepassi
A. Spiridonov
Joshua Newlan
Andrea Gesmundo
ALM
91
196
0
31 Mar 2022
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Xuezhi Wang
Jason W. Wei
Dale Schuurmans
Quoc Le
Ed H. Chi
Sharan Narang
Aakanksha Chowdhery
Denny Zhou
ReLM
BDL
LRM
AI4CE
497
3,589
0
21 Mar 2022
Memorizing Transformers
Memorizing Transformers
Yuhuai Wu
M. Rabe
DeLesley S. Hutchins
Christian Szegedy
RALM
68
177
0
16 Mar 2022
LongT5: Efficient Text-To-Text Transformer for Long Sequences
LongT5: Efficient Text-To-Text Transformer for Long Sequences
Mandy Guo
Joshua Ainslie
David C. Uthus
Santiago Ontanon
Jianmo Ni
Yun-hsuan Sung
Yinfei Yang
VLM
55
313
0
15 Dec 2021
Improving language models by retrieving from trillions of tokens
Improving language models by retrieving from trillions of tokens
Sebastian Borgeaud
A. Mensch
Jordan Hoffmann
Trevor Cai
Eliza Rutherford
...
Simon Osindero
Karen Simonyan
Jack W. Rae
Erich Elsen
Laurent Sifre
KELM
RALM
216
1,083
0
08 Dec 2021
Mention Memory: incorporating textual knowledge into Transformers
  through entity mention attention
Mention Memory: incorporating textual knowledge into Transformers through entity mention attention
Michiel de Jong
Yury Zemlyanskiy
Nicholas FitzGerald
Fei Sha
William W. Cohen
RALM
68
47
0
12 Oct 2021
KG-FiD: Infusing Knowledge Graph in Fusion-in-Decoder for Open-Domain
  Question Answering
KG-FiD: Infusing Knowledge Graph in Fusion-in-Decoder for Open-Domain Question Answering
Donghan Yu
Chenguang Zhu
Yuwei Fang
Wenhao Yu
Shuohang Wang
Yichong Xu
Xiang Ren
Yiming Yang
Michael Zeng
52
90
0
08 Oct 2021
ReadTwice: Reading Very Large Documents with Memories
ReadTwice: Reading Very Large Documents with Memories
Yury Zemlyanskiy
Joshua Ainslie
Michiel de Jong
Philip Pham
Ilya Eckstein
Fei Sha
AIMat
RALM
62
17
0
10 May 2021
Documenting Large Webtext Corpora: A Case Study on the Colossal Clean
  Crawled Corpus
Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus
Jesse Dodge
Maarten Sap
Ana Marasović
William Agnew
Gabriel Ilharco
Dirk Groeneveld
Margaret Mitchell
Matt Gardner
AILaw
96
443
0
18 Apr 2021
Leveraging Passage Retrieval with Generative Models for Open Domain
  Question Answering
Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering
Gautier Izacard
Edouard Grave
RALM
115
1,167
0
02 Jul 2020
Knowledge Distillation: A Survey
Knowledge Distillation: A Survey
Jianping Gou
B. Yu
Stephen J. Maybank
Dacheng Tao
VLM
60
2,932
0
09 Jun 2020
Language Models are Few-Shot Learners
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
675
41,736
0
28 May 2020
Dense Passage Retrieval for Open-Domain Question Answering
Dense Passage Retrieval for Open-Domain Question Answering
Vladimir Karpukhin
Barlas Oğuz
Sewon Min
Patrick Lewis
Ledell Yu Wu
Sergey Edunov
Danqi Chen
Wen-tau Yih
RALM
154
3,739
0
10 Apr 2020
How Much Knowledge Can You Pack Into the Parameters of a Language Model?
How Much Knowledge Can You Pack Into the Parameters of a Language Model?
Adam Roberts
Colin Raffel
Noam M. Shazeer
KELM
104
889
0
10 Feb 2020
REALM: Retrieval-Augmented Language Model Pre-Training
REALM: Retrieval-Augmented Language Model Pre-Training
Kelvin Guu
Kenton Lee
Zora Tung
Panupong Pasupat
Ming-Wei Chang
RALM
103
2,090
0
10 Feb 2020
Fast Transformer Decoding: One Write-Head is All You Need
Fast Transformer Decoding: One Write-Head is All You Need
Noam M. Shazeer
136
459
0
06 Nov 2019
Exploring the Limits of Transfer Learning with a Unified Text-to-Text
  Transformer
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel
Noam M. Shazeer
Adam Roberts
Katherine Lee
Sharan Narang
Michael Matena
Yanqi Zhou
Wei Li
Peter J. Liu
AIMat
375
20,053
0
23 Oct 2019
Latent Retrieval for Weakly Supervised Open Domain Question Answering
Latent Retrieval for Weakly Supervised Open Domain Question Answering
Kenton Lee
Ming-Wei Chang
Kristina Toutanova
RALM
94
1,010
0
01 Jun 2019
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost
Noam M. Shazeer
Mitchell Stern
ODL
72
1,043
0
11 Apr 2018
Attention Is All You Need
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
640
130,942
0
12 Jun 2017
TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for
  Reading Comprehension
TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension
Mandar Joshi
Eunsol Choi
Daniel S. Weld
Luke Zettlemoyer
RALM
195
2,636
0
09 May 2017
Distilling the Knowledge in a Neural Network
Distilling the Knowledge in a Neural Network
Geoffrey E. Hinton
Oriol Vinyals
J. Dean
FedML
316
19,609
0
09 Mar 2015
Adam: A Method for Stochastic Optimization
Adam: A Method for Stochastic Optimization
Diederik P. Kingma
Jimmy Ba
ODL
1.5K
149,842
0
22 Dec 2014
Sequence Transduction with Recurrent Neural Networks
Sequence Transduction with Recurrent Neural Networks
Alex Graves
175
1,866
0
14 Nov 2012
1