Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1609.07843
Cited By
Pointer Sentinel Mixture Models
26 September 2016
Stephen Merity
Caiming Xiong
James Bradbury
R. Socher
RALM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Pointer Sentinel Mixture Models"
50 / 696 papers shown
Title
A Plug-and-Play Method for Controlled Text Generation
Damian Pascual
Béni Egressy
Clara Meister
Ryan Cotterell
Roger Wattenhofer
27
89
0
20 Sep 2021
Phrase-BERT: Improved Phrase Embeddings from BERT with an Application to Corpus Exploration
Shufan Wang
Laure Thompson
Mohit Iyyer
180
66
0
13 Sep 2021
Assessing the Reliability of Word Embedding Gender Bias Measures
Yupei Du
Qixiang Fang
D. Nguyen
46
21
0
10 Sep 2021
Asynchronous Federated Learning on Heterogeneous Devices: A Survey
Chenhao Xu
Youyang Qu
Yong Xiang
Longxiang Gao
FedML
104
245
0
09 Sep 2021
Efficient Nearest Neighbor Language Models
Junxian He
Graham Neubig
Taylor Berg-Kirkpatrick
RALM
195
103
0
09 Sep 2021
Rare Tokens Degenerate All Tokens: Improving Neural Text Generation via Adaptive Gradient Gating for Rare Token Embeddings
Sangwon Yu
Jongyoon Song
Heeseung Kim
SeongEun Lee
Woo-Jong Ryu
Sung-Hoon Yoon
19
31
0
07 Sep 2021
PermuteFormer: Efficient Relative Position Encoding for Long Sequences
Peng-Jen Chen
36
21
0
06 Sep 2021
LegaLMFiT: Efficient Short Legal Text Classification with LSTM Language Model Pre-Training
Benjamin Clavié
Akshita Gheewala
Paul Briton
Marc Alphonsus
Rym Labiyaad
Francesco Piccoli
VLM
AILaw
32
2
0
02 Sep 2021
LOT: A Story-Centric Benchmark for Evaluating Chinese Long Text Understanding and Generation
Jian Guan
Zhuoer Feng
Yamei Chen
Ru He
Xiaoxi Mao
Changjie Fan
Minlie Huang
39
32
0
30 Aug 2021
Selective Differential Privacy for Language Modeling
Weiyan Shi
Aiqi Cui
Evan Li
R. Jia
Zhou Yu
20
68
0
30 Aug 2021
Offensive Language Identification in Low-resourced Code-mixed Dravidian languages using Pseudo-labeling
Adeep Hande
Karthik Puranik
Konthala Yasaswini
R. Priyadharshini
Sajeetha Thavareesan
Anbukkarasi Sampath
Kogilavani Shanmugavadivel
D. Thenmozhi
Bharathi Raja Chakravarthi
29
29
0
27 Aug 2021
Curriculum learning for language modeling
Daniel Fernando Campos
16
32
0
04 Aug 2021
Rethinking gradient sparsification as total error minimization
Atal Narayan Sahu
Aritra Dutta
A. Abdelmoniem
Trambak Banerjee
Marco Canini
Panos Kalnis
45
56
0
02 Aug 2021
On the Evaluation of Neural Code Summarization
Ensheng Shi
Yanlin Wang
Lun Du
Junjie Chen
Shi Han
Hongyu Zhang
Dongmei Zhang
Hongbin Sun
ELM
122
86
0
15 Jul 2021
Is Automated Topic Model Evaluation Broken?: The Incoherence of Coherence
Alexander Miserlis Hoyle
Pranav Goel
Denis Peskov
Andrew Hian-Cheong
Jordan L. Boyd-Graber
Philip Resnik
41
128
0
05 Jul 2021
R-Drop: Regularized Dropout for Neural Networks
Xiaobo Liang
Lijun Wu
Juntao Li
Yue Wang
Qi Meng
Tao Qin
Wei Chen
Hao Fei
Tie-Yan Liu
47
424
0
28 Jun 2021
Stabilizing Equilibrium Models by Jacobian Regularization
Shaojie Bai
V. Koltun
J. Zico Kolter
33
57
0
28 Jun 2021
Private Adaptive Gradient Methods for Convex Optimization
Hilal Asi
John C. Duchi
Alireza Fallah
O. Javidbakht
Kunal Talwar
19
53
0
25 Jun 2021
Multi-objective Asynchronous Successive Halving
Robin Schmucker
Michele Donini
Muhammad Bilal Zafar
David Salinas
Cédric Archambeau
32
23
0
23 Jun 2021
Secure Distributed Training at Scale
Eduard A. Gorbunov
Alexander Borzunov
Michael Diskin
Max Ryabinin
FedML
26
15
0
21 Jun 2021
What Context Features Can Transformer Language Models Use?
J. O'Connor
Jacob Andreas
KELM
29
75
0
15 Jun 2021
Divergence Frontiers for Generative Models: Sample Complexity, Quantization Effects, and Frontier Integrals
Lang Liu
Krishna Pillutla
Sean Welleck
Sewoong Oh
Yejin Choi
Zaïd Harchaoui
MQ
28
14
0
15 Jun 2021
Straight to the Gradient: Learning to Use Novel Tokens for Neural Text Generation
Xiang Lin
Simeng Han
Chenyu You
20
24
0
14 Jun 2021
Going Beyond Linear Transformers with Recurrent Fast Weight Programmers
Kazuki Irie
Imanol Schlag
Róbert Csordás
Jürgen Schmidhuber
33
57
0
11 Jun 2021
Linguistically Informed Masking for Representation Learning in the Patent Domain
Sophia Althammer
Mark Buckley
Sebastian Hofstatter
Allan Hanbury
42
11
0
10 Jun 2021
Self-Supervised Bug Detection and Repair
Miltiadis Allamanis
Henry Jackson-Flux
Marc Brockschmidt
23
103
0
26 May 2021
A Cognitive Regularizer for Language Modeling
Jason W. Wei
Clara Meister
Ryan Cotterell
19
21
0
15 May 2021
AT-ST: Self-Training Adaptation Strategy for OCR in Domains with Limited Transcriptions
M. Kišš
Karel Beneš
Michal Hradiš
64
13
0
27 Apr 2021
Differentiable Model Compression via Pseudo Quantization Noise
Alexandre Défossez
Yossi Adi
Gabriel Synnaeve
DiffM
MQ
18
47
0
20 Apr 2021
Broccoli: Sprinkling Lightweight Vocabulary Learning into Everyday Information Diets
Roland Aydin
Lars Klein
Arnaud Miribel
Robert West
18
1
0
16 Apr 2021
Finetuning Pretrained Transformers into RNNs
Jungo Kasai
Hao Peng
Yizhe Zhang
Dani Yogatama
Gabriel Ilharco
Nikolaos Pappas
Yi Mao
Weizhu Chen
Noah A. Smith
44
63
0
24 Mar 2021
Full Page Handwriting Recognition via Image to Sequence Extraction
Sumeet S. Singh
Sergey Karayev
27
53
0
11 Mar 2021
Random Feature Attention
Hao Peng
Nikolaos Pappas
Dani Yogatama
Roy Schwartz
Noah A. Smith
Lingpeng Kong
36
349
0
03 Mar 2021
The Rediscovery Hypothesis: Language Models Need to Meet Linguistics
Vassilina Nikoulina
Maxat Tezekbayev
Nuradil Kozhakhmet
Madina Babazhanova
Matthias Gallé
Z. Assylbekov
34
8
0
02 Mar 2021
Linear Transformers Are Secretly Fast Weight Programmers
Imanol Schlag
Kazuki Irie
Jürgen Schmidhuber
46
225
0
22 Feb 2021
Dancing along Battery: Enabling Transformer with Run-time Reconfigurability on Mobile Devices
Yuhong Song
Weiwen Jiang
Bingbing Li
Panjie Qi
Qingfeng Zhuge
E. Sha
Sakyasingha Dasgupta
Yiyu Shi
Caiwen Ding
18
18
0
12 Feb 2021
A Comprehensive Survey on Hardware-Aware Neural Architecture Search
Hadjer Benmeziane
Kaoutar El Maghraoui
Hamza Ouarnoughi
Smail Niar
Martin Wistuba
Naigang Wang
34
96
0
22 Jan 2021
AutoDropout: Learning Dropout Patterns to Regularize Deep Networks
Hieu H. Pham
Quoc V. Le
76
56
0
05 Jan 2021
Shortformer: Better Language Modeling using Shorter Inputs
Ofir Press
Noah A. Smith
M. Lewis
230
89
0
31 Dec 2020
ERNIE-Doc: A Retrospective Long-Document Modeling Transformer
Siyu Ding
Junyuan Shang
Shuohuan Wang
Yu Sun
Hao Tian
Hua Wu
Haifeng Wang
73
52
0
31 Dec 2020
Towards Zero-Shot Knowledge Distillation for Natural Language Processing
Ahmad Rashid
Vasileios Lioutas
Abbas Ghaddar
Mehdi Rezagholizadeh
21
27
0
31 Dec 2020
CLEAR: Contrastive Learning for Sentence Representation
Zhuofeng Wu
Sinong Wang
Jiatao Gu
Madian Khabsa
Fei Sun
Hao Ma
SSL
33
320
0
31 Dec 2020
Transformer Feed-Forward Layers Are Key-Value Memories
Mor Geva
R. Schuster
Jonathan Berant
Omer Levy
KELM
39
747
0
29 Dec 2020
A Theoretical Analysis of the Repetition Problem in Text Generation
Z. Fu
Wai Lam
Anthony Man-Cho So
Bei Shi
79
90
0
29 Dec 2020
FracTrain: Fractionally Squeezing Bit Savings Both Temporally and Spatially for Efficient DNN Training
Y. Fu
Haoran You
Yang Katie Zhao
Yue Wang
Chaojian Li
K. Gopalakrishnan
Zhangyang Wang
Yingyan Lin
MQ
38
32
0
24 Dec 2020
SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning
Hanrui Wang
Zhekai Zhang
Song Han
43
380
0
17 Dec 2020
Multi-Sense Language Modelling
Andrea Lekkas
Peter Schneider-Kamp
Isabelle Augenstein
KELM
11
2
0
10 Dec 2020
Efficient Estimation of Influence of a Training Instance
Sosuke Kobayashi
Sho Yokoi
Jun Suzuki
Kentaro Inui
TDI
32
15
0
08 Dec 2020
Adversarial Semantic Collisions
Congzheng Song
Alexander M. Rush
Vitaly Shmatikov
AAML
14
52
0
09 Nov 2020
CxGBERT: BERT meets Construction Grammar
Harish Tayyar Madabushi
Laurence Romain
Dagmar Divjak
P. Milin
19
40
0
09 Nov 2020
Previous
1
2
3
...
10
11
12
13
14
Next