Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1609.07843
Cited By
Pointer Sentinel Mixture Models
26 September 2016
Stephen Merity
Caiming Xiong
James Bradbury
R. Socher
RALM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Pointer Sentinel Mixture Models"
50 / 716 papers shown
Title
Minimalistic Unsupervised Learning with the Sparse Manifold Transform
Yubei Chen
Zeyu Yun
Yi Ma
Bruno A. Olshausen
Yann LeCun
54
8
0
30 Sep 2022
Depth-Wise Attention (DWAtt): A Layer Fusion Method for Data-Efficient Classification
Muhammad N. ElNokrashy
Badr AlKhamissi
Mona T. Diab
MoMe
25
4
0
30 Sep 2022
Stop Wasting My Time! Saving Days of ImageNet and BERT Training with Latest Weight Averaging
Jean Kaddour
MoMe
3DH
24
40
0
29 Sep 2022
Mega: Moving Average Equipped Gated Attention
Xuezhe Ma
Chunting Zhou
Xiang Kong
Junxian He
Liangke Gui
Graham Neubig
Jonathan May
Luke Zettlemoyer
38
183
0
21 Sep 2022
Can Offline Reinforcement Learning Help Natural Language Understanding?
Ziqi Zhang
Yile Wang
Yue Zhang
Donglin Wang
OffRL
38
0
0
15 Sep 2022
PANCETTA: Phoneme Aware Neural Completion to Elicit Tongue Twisters Automatically
Sedrick Scott Keh
Steven Y. Feng
Varun Gangal
Malihe Alikhani
Eduard H. Hovy
23
4
0
13 Sep 2022
Deep Learning-based approaches for automatic detection of shell nouns and evaluation on WikiText-2
C. Yao
Cuihua Wang
8
0
0
25 Aug 2022
Improving Natural-Language-based Audio Retrieval with Transfer Learning and Audio & Text Augmentations
Paul Primus
Gerhard Widmer
29
6
0
24 Aug 2022
Adam Can Converge Without Any Modification On Update Rules
Yushun Zhang
Congliang Chen
Naichen Shi
Ruoyu Sun
Zhimin Luo
18
63
0
20 Aug 2022
A Length Adaptive Algorithm-Hardware Co-design of Transformer on FPGA Through Sparse Attention and Dynamic Pipelining
Hongwu Peng
Shaoyi Huang
Shiyang Chen
Bingbing Li
Tong Geng
...
Weiwen Jiang
Wujie Wen
J. Bi
Hang Liu
Caiwen Ding
47
54
0
07 Aug 2022
Toward Understanding WordArt: Corner-Guided Transformer for Scene Text Recognition
Xudong Xie
Ling Fu
Zhifei Zhang
Zhaowen Wang
X. Bai
ViT
38
45
0
31 Jul 2022
A Variational AutoEncoder for Transformers with Nonparametric Variational Information Bottleneck
James Henderson
Fabio Fehr
DRL
21
3
0
27 Jul 2022
Innovations in Neural Data-to-text Generation: A Survey
Mandar Sharma
Ajay K. Gogineni
Naren Ramakrishnan
36
10
0
25 Jul 2022
Scene Text Recognition with Permuted Autoregressive Sequence Models
Darwin Bautista
Rowel Atienza
28
169
0
14 Jul 2022
Recurrent Memory Transformer
Aydar Bulatov
Yuri Kuratov
Andrey Kravchenko
CLL
13
103
0
14 Jul 2022
Bayesian Modeling of Language-Evoked Event-Related Potentials
Davide Turco
Conor J. Houghton
33
2
0
07 Jul 2022
Probing via Prompting
Jiaoda Li
Ryan Cotterell
Mrinmaya Sachan
37
13
0
04 Jul 2022
Reading and Writing: Discriminative and Generative Modeling for Self-Supervised Text Recognition
Mingkun Yang
Minghui Liao
Pu Lu
Jing Wang
Shenggao Zhu
Hualin Luo
Qingzhen Tian
X. Bai
SSL
37
55
0
01 Jul 2022
Where to Begin? On the Impact of Pre-Training and Initialization in Federated Learning
John Nguyen
Jianyu Wang
Kshitiz Malik
Maziar Sanjabi
Michael G. Rabbat
FedML
AI4CE
31
21
0
30 Jun 2022
Winning the Lottery Ahead of Time: Efficient Early Network Pruning
John Rachwan
Daniel Zügner
Bertrand Charpentier
Simon Geisler
Morgane Ayle
Stephan Günnemann
32
24
0
21 Jun 2022
Bootstrapped Transformer for Offline Reinforcement Learning
Kerong Wang
Hanye Zhao
Xufang Luo
Kan Ren
Weinan Zhang
Dongsheng Li
OffRL
18
37
0
17 Jun 2022
Emergent Abilities of Large Language Models
Jason W. Wei
Yi Tay
Rishi Bommasani
Colin Raffel
Barret Zoph
...
Tatsunori Hashimoto
Oriol Vinyals
Percy Liang
J. Dean
W. Fedus
ELM
ReLM
LRM
93
2,364
0
15 Jun 2022
8-bit Numerical Formats for Deep Neural Networks
Badreddine Noune
Philip Jones
Daniel Justus
Dominic Masters
Carlo Luschi
MQ
23
34
0
06 Jun 2022
ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
Z. Yao
Reza Yazdani Aminabadi
Minjia Zhang
Xiaoxia Wu
Conglong Li
Yuxiong He
VLM
MQ
73
448
0
04 Jun 2022
[Re] Badder Seeds: Reproducing the Evaluation of Lexical Methods for Bias Measurement
Jille van der Togt
Lea Tiyavorabun
Matteo Rosati
Giulio Starace
11
0
0
03 Jun 2022
What Changed? Investigating Debiasing Methods using Causal Mediation Analysis
Su-Ha Jeoung
Jana Diesner
CML
27
7
0
01 Jun 2022
The CLRS Algorithmic Reasoning Benchmark
Petar Velivcković
Adria Puigdomenech Badia
David Budden
Razvan Pascanu
Andrea Banino
Mikhail Dashevskiy
R. Hadsell
Charles Blundell
163
89
0
31 May 2022
kNN-Prompt: Nearest Neighbor Zero-Shot Inference
Weijia Shi
Julian Michael
Suchin Gururangan
Luke Zettlemoyer
RALM
VLM
29
32
0
27 May 2022
Quark: Controllable Text Generation with Reinforced Unlearning
Ximing Lu
Sean Welleck
Jack Hessel
Liwei Jiang
Lianhui Qin
Peter West
Prithviraj Ammanabrolu
Yejin Choi
MU
68
206
0
26 May 2022
Training Language Models with Memory Augmentation
Zexuan Zhong
Tao Lei
Danqi Chen
RALM
249
128
0
25 May 2022
Memorization in NLP Fine-tuning Methods
Fatemehsadat Mireshghallah
Archit Uniyal
Tianhao Wang
David Evans
Taylor Berg-Kirkpatrick
AAML
67
39
0
25 May 2022
Large Language Models are Zero-Shot Reasoners
Takeshi Kojima
S. Gu
Machel Reid
Yutaka Matsuo
Yusuke Iwasawa
ReLM
LRM
328
4,142
0
24 May 2022
Outliers Dimensions that Disrupt Transformers Are Driven by Frequency
Giovanni Puccetti
Anna Rogers
Aleksandr Drozd
F. Dell’Orletta
81
42
0
23 May 2022
Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models
Kushal Tirumala
Aram H. Markosyan
Luke Zettlemoyer
Armen Aghajanyan
TDI
31
187
0
22 May 2022
Visually-Augmented Language Modeling
Weizhi Wang
Li Dong
Hao Cheng
Haoyu Song
Xiaodong Liu
Xifeng Yan
Jianfeng Gao
Furu Wei
VLM
36
18
0
20 May 2022
Recovering Private Text in Federated Learning of Language Models
Samyak Gupta
Yangsibo Huang
Zexuan Zhong
Tianyu Gao
Kai Li
Danqi Chen
FedML
40
75
0
17 May 2022
Prompting to Distill: Boosting Data-Free Knowledge Distillation via Reinforced Prompt
Xinyin Ma
Xinchao Wang
Gongfan Fang
Yongliang Shen
Weiming Lu
24
11
0
16 May 2022
What's in a Caption? Dataset-Specific Linguistic Diversity and Its Effect on Visual Description Models and Metrics
David M. Chan
Austin Myers
Sudheendra Vijayanarasimhan
David A. Ross
Bryan Seybold
John F. Canny
33
6
0
12 May 2022
Extracting Latent Steering Vectors from Pretrained Language Models
Nishant Subramani
Nivedita Suresh
Matthew E. Peters
LLMSV
36
82
0
10 May 2022
Improving negation detection with negation-focused pre-training
Thinh Hung Truong
Timothy Baldwin
Trevor Cohn
Karin Verspoor
32
21
0
09 May 2022
Multimodal Semi-Supervised Learning for Text Recognition
Aviad Aberdam
Roy Ganz
Shai Mazor
Ron Litman
VLM
28
19
0
08 May 2022
Bridging the Domain Gap for Stance Detection for the Zulu language
Gcinizwe Dlamini
I. E. I. Bekkouch
A. Khan
Leon Derczynski
28
3
0
06 May 2022
To Know by the Company Words Keep and What Else Lies in the Vicinity
Jake Williams
H. Heidenreich
24
0
0
30 Apr 2022
C3-STISR: Scene Text Image Super-resolution with Triple Clues
Minyi Zhao
Miaosen Wang
Fan Bai
Bingjia Li
Jie Wang
Shuigeng Zhou
27
32
0
29 Apr 2022
Backdoor Attacks in Federated Learning by Rare Embeddings and Gradient Ensembling
Kiyoon Yoo
Nojun Kwak
SILM
AAML
FedML
25
19
0
29 Apr 2022
Detecting Textual Adversarial Examples Based on Distributional Characteristics of Data Representations
Na Liu
Mark Dras
Wei Emma Zhang
AAML
24
6
0
29 Apr 2022
Can Rationalization Improve Robustness?
Howard Chen
Jacqueline He
Karthik Narasimhan
Danqi Chen
AAML
31
40
0
25 Apr 2022
MuCoT: Multilingual Contrastive Training for Question-Answering in Low-resource Languages
Gokul Karthik Kumar
Abhishek Singh Gehlot
Sahal Shaji Mullappilly
Karthik Nandakumar
36
13
0
12 Apr 2022
Content and Style Aware Generation of Text-line Images for Handwriting Recognition
Lei Kang
Pau Riba
Marçal Rusiñol
Alicia Fornés
M. Villegas
19
41
0
12 Apr 2022
A Call for Clarity in Beam Search: How It Works and When It Stops
Jungo Kasai
Keisuke Sakaguchi
Ronan Le Bras
Dragomir R. Radev
Yejin Choi
Noah A. Smith
28
6
0
11 Apr 2022
Previous
1
2
3
...
8
9
10
...
13
14
15
Next