Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1609.07843
Cited By
Pointer Sentinel Mixture Models
26 September 2016
Stephen Merity
Caiming Xiong
James Bradbury
R. Socher
RALM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Pointer Sentinel Mixture Models"
50 / 702 papers shown
Title
Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space
Mor Geva
Avi Caciularu
Ke Wang
Yoav Goldberg
KELM
69
338
0
28 Mar 2022
Linearizing Transformer with Key-Value Memory
Yizhe Zhang
Deng Cai
30
5
0
23 Mar 2022
Mitigating Gender Bias in Distilled Language Models via Counterfactual Role Reversal
Umang Gupta
Jwala Dhamala
Varun Kumar
Apurv Verma
Yada Pruksachatkun
Satyapriya Krishna
Rahul Gupta
Kai-Wei Chang
Greg Ver Steeg
Aram Galstyan
21
49
0
23 Mar 2022
Compression of Generative Pre-trained Language Models via Quantization
Chaofan Tao
Lu Hou
Wei Zhang
Lifeng Shang
Xin Jiang
Qun Liu
Ping Luo
Ngai Wong
MQ
38
103
0
21 Mar 2022
Dependency-based Mixture Language Models
Zhixian Yang
Xiaojun Wan
49
2
0
19 Mar 2022
Training a Tokenizer for Free with Private Federated Learning
Eugene Bagdasaryan
Congzheng Song
Rogier van Dalen
M. Seigel
Áine Cahill
FedML
22
5
0
15 Mar 2022
Uncertainty Estimation for Language Reward Models
Adam Gleave
G. Irving
UQLM
39
31
0
14 Mar 2022
Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer
Greg Yang
J. E. Hu
Igor Babuschkin
Szymon Sidor
Xiaodong Liu
David Farhi
Nick Ryder
J. Pachocki
Weizhu Chen
Jianfeng Gao
26
149
0
07 Mar 2022
Mukayese: Turkish NLP Strikes Back
Ali Safaya
Emirhan Kurtulucs
Arda Goktougan
Deniz Yuret
28
22
0
02 Mar 2022
Fast-R2D2: A Pretrained Recursive Neural Network based on Pruned CKY for Grammar Induction and Text Representation
Xiang Hu
Haitao Mi
Liang Li
Gerard de Melo
34
13
0
01 Mar 2022
Oolong: Investigating What Makes Transfer Learning Hard with Controlled Studies
Zhengxuan Wu
Alex Tamkin
Isabel Papadimitriou
21
10
0
24 Feb 2022
Interpreting Language Models with Contrastive Explanations
Kayo Yin
Graham Neubig
MILM
23
78
0
21 Feb 2022
cosFormer: Rethinking Softmax in Attention
Zhen Qin
Weixuan Sun
Huicai Deng
Dongxu Li
Yunshen Wei
Baohong Lv
Junjie Yan
Lingpeng Kong
Yiran Zhong
38
212
0
17 Feb 2022
General-purpose, long-context autoregressive modeling with Perceiver AR
Curtis Hawthorne
Andrew Jaegle
Cătălina Cangea
Sebastian Borgeaud
C. Nash
...
Hannah R. Sheahan
Neil Zeghidour
Jean-Baptiste Alayrac
João Carreira
Jesse Engel
43
65
0
15 Feb 2022
Flowformer: Linearizing Transformers with Conservation Flows
Haixu Wu
Jialong Wu
Jiehui Xu
Jianmin Wang
Mingsheng Long
14
90
0
13 Feb 2022
The Dual Form of Neural Networks Revisited: Connecting Test Time Predictions to Training Patterns via Spotlights of Attention
Kazuki Irie
Róbert Csordás
Jürgen Schmidhuber
16
42
0
11 Feb 2022
Cedille: A large autoregressive French language model
Martin Müller
Florian Laurent
36
19
0
07 Feb 2022
Evaluating natural language processing models with generalization metrics that do not need access to any training or testing data
Yaoqing Yang
Ryan Theisen
Liam Hodgkinson
Joseph E. Gonzalez
Kannan Ramchandran
Charles H. Martin
Michael W. Mahoney
94
17
0
06 Feb 2022
Harmony: Overcoming the Hurdles of GPU Memory Capacity to Train Massive DNN Models on Commodity Servers
Youjie Li
Amar Phanishayee
D. Murray
Jakub Tarnawski
N. Kim
19
19
0
02 Feb 2022
Unified Scaling Laws for Routed Language Models
Aidan Clark
Diego de Las Casas
Aurelia Guy
A. Mensch
Michela Paganini
...
Oriol Vinyals
Jack W. Rae
Erich Elsen
Koray Kavukcuoglu
Karen Simonyan
MoE
27
177
0
02 Feb 2022
Neuro-Symbolic Language Modeling with Automaton-augmented Retrieval
Uri Alon
Frank F. Xu
Junxian He
Sudipta Sengupta
Dan Roth
Graham Neubig
RALM
77
63
0
28 Jan 2022
Benchmarking Resource Usage for Efficient Distributed Deep Learning
Nathan C. Frey
Baolin Li
Joseph McDonald
Dan Zhao
Michael Jones
David Bestor
Devesh Tiwari
V. Gadepally
S. Samsi
35
9
0
28 Jan 2022
Describing Differences between Text Distributions with Natural Language
Ruiqi Zhong
Charles Burton Snell
Dan Klein
Jacob Steinhardt
VLM
132
42
0
28 Jan 2022
Can Wikipedia Help Offline Reinforcement Learning?
Machel Reid
Yutaro Yamada
S. Gu
3DV
RALM
OffRL
140
95
0
28 Jan 2022
A Secure and Efficient Federated Learning Framework for NLP
Jieren Deng
Chenghong Wang
Xianrui Meng
Yijue Wang
Ji Li
Sheng Lin
Shuo Han
Fei Miao
Sanguthevar Rajasekaran
Caiwen Ding
FedML
77
22
0
28 Jan 2022
Artefact Retrieval: Overview of NLP Models with Knowledge Base Access
Vilém Zouhar
Marius Mosbach
Debanjali Biswas
Dietrich Klakow
KELM
34
4
0
24 Jan 2022
FedComm: Federated Learning as a Medium for Covert Communication
Dorjan Hitaj
Giulio Pagnotta
Briland Hitaj
Fernando Perez-Cruz
L. Mancini
FedML
32
10
0
21 Jan 2022
May the Force Be with Your Copy Mechanism: Enhanced Supervised-Copy Method for Natural Language Generation
Sanghyuk Choi
J. Hwang
Hyungjong Noh
Yeonsoo Lee
20
3
0
20 Dec 2021
Towards More Efficient Insertion Transformer with Fractional Positional Encoding
Zhisong Zhang
Yizhe Zhang
W. Dolan
49
0
0
12 Dec 2021
Show, Write, and Retrieve: Entity-aware Article Generation and Retrieval
Zhongping Zhang
Yiwen Gu
Bryan A. Plummer
48
2
0
11 Dec 2021
Improving language models by retrieving from trillions of tokens
Sebastian Borgeaud
A. Mensch
Jordan Hoffmann
Trevor Cai
Eliza Rutherford
...
Simon Osindero
Karen Simonyan
Jack W. Rae
Erich Elsen
Laurent Sifre
KELM
RALM
90
1,027
0
08 Dec 2021
Membership Inference Attacks From First Principles
Nicholas Carlini
Steve Chien
Milad Nasr
Shuang Song
Andreas Terzis
Florian Tramèr
MIACV
MIALM
29
646
0
07 Dec 2021
Public Data-Assisted Mirror Descent for Private Model Training
Ehsan Amid
Arun Ganesh
Rajiv Mathews
Swaroop Indra Ramaswamy
Shuang Song
Thomas Steinke
Vinith Suriyakumar
Om Thakkar
Abhradeep Thakurta
21
49
0
01 Dec 2021
Pixelated Butterfly: Simple and Efficient Sparse training for Neural Network Models
Tri Dao
Beidi Chen
Kaizhao Liang
Jiaming Yang
Zhao Song
Atri Rudra
Christopher Ré
33
75
0
30 Nov 2021
How much do language models copy from their training data? Evaluating linguistic novelty in text generation using RAVEN
R. Thomas McCoy
P. Smolensky
Tal Linzen
Jianfeng Gao
Asli Celikyilmaz
SyDa
25
119
0
18 Nov 2021
On Training Implicit Models
Zhengyang Geng
Xinyu Zhang
Shaojie Bai
Yisen Wang
Zhouchen Lin
72
69
0
09 Nov 2021
Tip-Adapter: Training-free CLIP-Adapter for Better Vision-Language Modeling
Renrui Zhang
Rongyao Fang
Wei Zhang
Peng Gao
Kunchang Li
Jifeng Dai
Yu Qiao
Hongsheng Li
VLM
194
387
0
06 Nov 2021
Backdoor Pre-trained Models Can Transfer to All
Lujia Shen
S. Ji
Xuhong Zhang
Jinfeng Li
Jing Chen
Jie Shi
Chengfang Fang
Jianwei Yin
Ting Wang
AAML
SILM
31
120
0
30 Oct 2021
Robbing the Fed: Directly Obtaining Private Data in Federated Learning with Modified Models
Liam H. Fowl
Jonas Geiping
W. Czaja
Micah Goldblum
Tom Goldstein
FedML
38
145
0
25 Oct 2021
AxoNN: An asynchronous, message-driven parallel framework for extreme-scale deep learning
Siddharth Singh
A. Bhatele
GNN
34
14
0
25 Oct 2021
Contrastive Learning for Neural Topic Model
Thong Nguyen
A. Luu
26
56
0
25 Oct 2021
Accelerating Framework of Transformer by Hardware Design and Model Compression Co-Optimization
Panjie Qi
E. Sha
Qingfeng Zhuge
Hongwu Peng
Shaoyi Huang
Zhenglun Kong
Yuhong Song
Bingbing Li
11
50
0
19 Oct 2021
Compositional Attention: Disentangling Search and Retrieval
Sarthak Mittal
Sharath Chandra Raparthy
Irina Rish
Yoshua Bengio
Guillaume Lajoie
22
20
0
18 Oct 2021
Self-Supervised Representation Learning: Introduction, Advances and Challenges
Linus Ericsson
Henry Gouk
Chen Change Loy
Timothy M. Hospedales
SSL
OOD
AI4TS
34
274
0
18 Oct 2021
GNN-LM: Language Modeling based on Global Contexts via GNN
Yuxian Meng
Shi Zong
Xiaoya Li
Xiaofei Sun
Tianwei Zhang
Fei Wu
Jiwei Li
LRM
29
37
0
17 Oct 2021
Hydra: A System for Large Multi-Model Deep Learning
Kabir Nagrecha
Arun Kumar
MoE
AI4CE
38
5
0
16 Oct 2021
An Empirical Survey of the Effectiveness of Debiasing Techniques for Pre-trained Language Models
Nicholas Meade
Elinor Poole-Dayan
Siva Reddy
22
123
0
16 Oct 2021
A Short Study on Compressing Decoder-Based Language Models
Tianda Li
Yassir El Mesbahi
I. Kobyzev
Ahmad Rashid
A. Mahmud
Nithin Anchuri
Habib Hajimolahoseini
Yang Liu
Mehdi Rezagholizadeh
93
25
0
16 Oct 2021
Kronecker Decomposition for GPT Compression
Ali Edalati
Marzieh S. Tahaei
Ahmad Rashid
V. Nia
J. Clark
Mehdi Rezagholizadeh
36
33
0
15 Oct 2021
Towards More Effective and Economic Sparsely-Activated Model
Hao Jiang
Ke Zhan
Jianwei Qu
Yongkang Wu
Zhaoye Fei
...
Enrui Hu
Yinxia Zhang
Yantao Jia
Fan Yu
Bo Zhao
MoE
157
12
0
14 Oct 2021
Previous
1
2
3
...
9
10
11
...
13
14
15
Next