Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1609.07843
Cited By
Pointer Sentinel Mixture Models
26 September 2016
Stephen Merity
Caiming Xiong
James Bradbury
R. Socher
RALM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Pointer Sentinel Mixture Models"
50 / 739 papers shown
Title
A Multi-Grained Self-Interpretable Symbolic-Neural Model For Single/Multi-Labeled Text Classification
Xiang Hu
Xinyu Kong
Kewei Tu
MILM
BDL
34
5
0
06 Mar 2023
Goal Driven Discovery of Distributional Differences via Language Descriptions
Ruiqi Zhong
Peter Zhang
Steve Li
Jinwoo Ahn
Dan Klein
Jacob Steinhardt
49
49
0
28 Feb 2023
Full Stack Optimization of Transformer Inference: a Survey
Sehoon Kim
Coleman Hooper
Thanakul Wattanawong
Minwoo Kang
Ruohan Yan
...
Qijing Huang
Kurt Keutzer
Michael W. Mahoney
Y. Shao
A. Gholami
MQ
41
102
0
27 Feb 2023
MUX-PLMs: Data Multiplexing for High-throughput Language Models
Vishvak Murahari
Ameet Deshpande
Carlos E. Jimenez
Izhak Shafran
Mingqiu Wang
Yuan Cao
Karthik Narasimhan
MoE
31
5
0
24 Feb 2023
k
k
k
NN-Adapter: Efficient Domain Adaptation for Black-Box Language Models
Yangsibo Huang
Daogao Liu
Zexuan Zhong
Weijia Shi
Y. Lee
RALM
ALM
43
14
0
21 Feb 2023
Poisoning Web-Scale Training Datasets is Practical
Nicholas Carlini
Matthew Jagielski
Christopher A. Choquette-Choo
Daniel Paleka
Will Pearce
Hyrum S. Anderson
Andreas Terzis
Kurt Thomas
Florian Tramèr
SILM
36
182
0
20 Feb 2023
Privately Customizing Prefinetuning to Better Match User Data in Federated Learning
Charlie Hou
Hongyuan Zhan
Akshat Shrivastava
Sida I. Wang
S. Livshits
Giulia Fanti
Daniel Lazar
FedML
37
15
0
17 Feb 2023
Role of Bias Terms in Dot-Product Attention
Mahdi Namazifar
Devamanyu Hazarika
Dilek Z. Hakkani-Tür
26
3
0
16 Feb 2023
A Reparameterized Discrete Diffusion Model for Text Generation
Lin Zheng
Jianbo Yuan
Lei Yu
Lingpeng Kong
DiffM
41
57
0
11 Feb 2023
SparseProp: Efficient Sparse Backpropagation for Faster Training of Neural Networks
Mahdi Nikdan
Tommaso Pegolotti
Eugenia Iofinova
Eldar Kurtic
Dan Alistarh
31
11
0
09 Feb 2023
Toolformer: Language Models Can Teach Themselves to Use Tools
Timo Schick
Jane Dwivedi-Yu
Roberto Dessì
Roberta Raileanu
Maria Lomeli
Luke Zettlemoyer
Nicola Cancedda
Thomas Scialom
SyDa
RALM
43
1,624
0
09 Feb 2023
Efficient Attention via Control Variates
Lin Zheng
Jianbo Yuan
Chong-Jun Wang
Lingpeng Kong
39
18
0
09 Feb 2023
Revisiting Offline Compression: Going Beyond Factorization-based Methods for Transformer Language Models
Mohammadreza Banaei
Klaudia Bałazy
Artur Kasymov
R. Lebret
Jacek Tabor
Karl Aberer
OffRL
21
0
0
08 Feb 2023
What Matters In The Structured Pruning of Generative Language Models?
Michael Santacroce
Zixin Wen
Yelong Shen
Yuan-Fang Li
31
33
0
07 Feb 2023
REPLUG: Retrieval-Augmented Black-Box Language Models
Weijia Shi
Sewon Min
Michihiro Yasunaga
Minjoon Seo
Rich James
M. Lewis
Luke Zettlemoyer
Wen-tau Yih
RALM
VLM
KELM
85
586
0
30 Jan 2023
Context-Aware Differential Privacy for Language Modeling
M. H. Dinh
Ferdinando Fioretto
33
2
0
28 Jan 2023
SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient
Max Ryabinin
Tim Dettmers
Michael Diskin
Alexander Borzunov
MoE
37
31
0
27 Jan 2023
Read the Signs: Towards Invariance to Gradient Descent's Hyperparameter Initialization
Davood Wadi
M. Fredette
S. Sénécal
ODL
AI4CE
11
0
0
24 Jan 2023
Why do Nearest Neighbor Language Models Work?
Frank F. Xu
Uri Alon
Graham Neubig
RALM
30
22
0
07 Jan 2023
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot
Elias Frantar
Dan Alistarh
VLM
49
643
0
02 Jan 2023
Hungry Hungry Hippos: Towards Language Modeling with State Space Models
Daniel Y. Fu
Tri Dao
Khaled Kamal Saab
A. Thomas
Atri Rudra
Christopher Ré
78
372
0
28 Dec 2022
Training Integer-Only Deep Recurrent Neural Networks
V. Nia
Eyyub Sari
Vanessa Courville
M. Asgharian
MQ
53
2
0
22 Dec 2022
In-context Learning Distillation: Transferring Few-shot Learning Ability of Pre-trained Language Models
Yukun Huang
Yanda Chen
Zhou Yu
Kathleen McKeown
32
30
0
20 Dec 2022
On the Blind Spots of Model-Based Evaluation Metrics for Text Generation
Tianxing He
Jingyu Zhang
Tianle Wang
Sachin Kumar
Kyunghyun Cho
James R. Glass
Yulia Tsvetkov
55
44
0
20 Dec 2022
Evaluating Human-Language Model Interaction
Mina Lee
Megha Srivastava
Amelia Hardy
John Thickstun
Esin Durmus
...
Hancheng Cao
Tony Lee
Rishi Bommasani
Michael S. Bernstein
Percy Liang
LM&MA
ALM
63
100
0
19 Dec 2022
Language model acceptability judgements are not always robust to context
Koustuv Sinha
Jon Gauthier
Aaron Mueller
Kanishka Misra
Keren Fuentes
R. Levy
Adina Williams
23
18
0
18 Dec 2022
Technical Report -- Competition Solution for Prompt Tuning using Pretrained Language Model
Jiang-Long Song
Wuhe Zou
Feng Li
Xiaolei Qin
Weidong Zhang
39
0
0
13 Dec 2022
A New Linear Scaling Rule for Private Adaptive Hyperparameter Optimization
Ashwinee Panda
Xinyu Tang
Saeed Mahloujifar
Vikash Sehwag
Prateek Mittal
48
11
0
08 Dec 2022
Editing Models with Task Arithmetic
Gabriel Ilharco
Marco Tulio Ribeiro
Mitchell Wortsman
Suchin Gururangan
Ludwig Schmidt
Hannaneh Hajishirzi
Ali Farhadi
KELM
MoMe
MU
77
443
0
08 Dec 2022
Statistical and Computational Guarantees for Influence Diagnostics
Jillian R. Fisher
Lang Liu
Krishna Pillutla
Y. Choi
Zaïd Harchaoui
TDI
29
0
0
08 Dec 2022
Meta-Learning Fast Weight Language Models
Kevin Clark
Kelvin Guu
Ming-Wei Chang
Panupong Pasupat
Geoffrey E. Hinton
Mohammad Norouzi
KELM
32
13
0
05 Dec 2022
Momentum Decoding: Open-ended Text Generation As Graph Exploration
Tian Lan
Yixuan Su
Shuhang Liu
Heyan Huang
Xian-Ling Mao
47
5
0
05 Dec 2022
Language Model Pre-training on True Negatives
Zhuosheng Zhang
Hai Zhao
Masao Utiyama
Eiichiro Sumita
39
2
0
01 Dec 2022
ABINet++: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Spotting
Shancheng Fang
Zhendong Mao
Hongtao Xie
Yuxin Wang
C. Yan
Yongdong Zhang
41
53
0
19 Nov 2022
SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
Guangxuan Xiao
Ji Lin
Mickael Seznec
Hao Wu
Julien Demouth
Song Han
MQ
101
749
0
18 Nov 2022
Random-LTD: Random and Layerwise Token Dropping Brings Efficient Training for Large-scale Transformers
Z. Yao
Xiaoxia Wu
Conglong Li
Connor Holmes
Minjia Zhang
Cheng-rong Li
Yuxiong He
31
11
0
17 Nov 2022
QuantPipe: Applying Adaptive Post-Training Quantization for Distributed Transformer Pipelines in Dynamic Edge Environments
Hong Wang
Connor Imes
Souvik Kundu
Peter A. Beerel
S. Crago
J. Walters
MQ
23
7
0
08 Nov 2022
No Word Embedding Model Is Perfect: Evaluating the Representation Accuracy for Social Bias in the Media
Maximilian Spliethover
Maximilian Keiff
Henning Wachsmuth
26
4
0
07 Nov 2022
How Much Does Attention Actually Attend? Questioning the Importance of Attention in Pretrained Transformers
Michael Hassid
Hao Peng
Daniel Rotem
Jungo Kasai
Ivan Montero
Noah A. Smith
Roy Schwartz
34
25
0
07 Nov 2022
Circling Back to Recurrent Models of Language
Gábor Melis
42
0
0
03 Nov 2022
Generative Adversarial Training Can Improve Neural Language Models
Sajad Movahedi
A. Shakery
GAN
AI4CE
34
2
0
02 Nov 2022
L-GreCo: Layerwise-Adaptive Gradient Compression for Efficient and Accurate Deep Learning
Mohammadreza Alimohammadi
I. Markov
Elias Frantar
Dan Alistarh
40
5
0
31 Oct 2022
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
Elias Frantar
Saleh Ashkboos
Torsten Hoefler
Dan Alistarh
MQ
33
905
0
31 Oct 2022
A Solvable Model of Neural Scaling Laws
A. Maloney
Daniel A. Roberts
J. Sully
52
51
0
30 Oct 2022
Nearest Neighbor Language Models for Stylistic Controllable Generation
Severino Trotta
Lucie Flek
Charles F Welch
31
4
0
27 Oct 2022
Contrastive Decoding: Open-ended Text Generation as Optimization
Xiang Lisa Li
Ari Holtzman
Daniel Fried
Percy Liang
Jason Eisner
Tatsunori Hashimoto
Luke Zettlemoyer
M. Lewis
51
332
0
27 Oct 2022
Revision for Concision: A Constrained Paraphrase Generation Task
Wenchuan Mu
Kwanin Lim
34
3
0
25 Oct 2022
Contrastive Search Is What You Need For Neural Text Generation
Yixuan Su
Nigel Collier
25
50
0
25 Oct 2022
Learning to Invert: Simple Adaptive Attacks for Gradient Inversion in Federated Learning
Ruihan Wu
Xiangyu Chen
Chuan Guo
Kilian Q. Weinberger
FedML
20
26
0
19 Oct 2022
The Devil in Linear Transformer
Zhen Qin
Xiaodong Han
Weixuan Sun
Dongxu Li
Lingpeng Kong
Nick Barnes
Yiran Zhong
36
71
0
19 Oct 2022
Previous
1
2
3
...
7
8
9
...
13
14
15
Next