ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1609.07843
  4. Cited By
Pointer Sentinel Mixture Models

Pointer Sentinel Mixture Models

26 September 2016
Stephen Merity
Caiming Xiong
James Bradbury
R. Socher
    RALM
ArXivPDFHTML

Papers citing "Pointer Sentinel Mixture Models"

50 / 696 papers shown
Title
Adversarial Black-Box Attacks On Text Classifiers Using Multi-Objective
  Genetic Optimization Guided By Deep Networks
Adversarial Black-Box Attacks On Text Classifiers Using Multi-Objective Genetic Optimization Guided By Deep Networks
Alex Mathai
Shreya Khare
Srikanth G. Tamilselvam
Senthil Mani
AAML
36
6
0
08 Nov 2020
Concealed Data Poisoning Attacks on NLP Models
Concealed Data Poisoning Attacks on NLP Models
Eric Wallace
Tony Zhao
Shi Feng
Sameer Singh
SILM
19
18
0
23 Oct 2020
Limitations of Autoregressive Models and Their Alternatives
Limitations of Autoregressive Models and Their Alternatives
Chu-cheng Lin
Aaron Jaech
Xin Li
Matthew R. Gormley
Jason Eisner
29
58
0
22 Oct 2020
Cross Copy Network for Dialogue Generation
Cross Copy Network for Dialogue Generation
Changzhen Ji
Xiaoxia Zhou
Yating Zhang
Xiaozhong Liu
Changlong Sun
Conghui Zhu
T. Zhao
27
11
0
22 Oct 2020
Dual Averaging is Surprisingly Effective for Deep Learning Optimization
Dual Averaging is Surprisingly Effective for Deep Learning Optimization
Samy Jelassi
Aaron Defazio
36
4
0
20 Oct 2020
Memformer: A Memory-Augmented Transformer for Sequence Modeling
Memformer: A Memory-Augmented Transformer for Sequence Modeling
Qingyang Wu
Zhenzhong Lan
Kun Qian
Jing Gu
A. Geramifard
Zhou Yu
22
49
0
14 Oct 2020
Are Some Words Worth More than Others?
Are Some Words Worth More than Others?
Shiran Dudy
Steven Bedrick
18
14
0
12 Oct 2020
Plan ahead: Self-Supervised Text Planning for Paragraph Completion Task
Plan ahead: Self-Supervised Text Planning for Paragraph Completion Task
Dongyeop Kang
Eduard H. Hovy
LRM
42
24
0
11 Oct 2020
Knowledge-Enriched Distributional Model Inversion Attacks
Knowledge-Enriched Distributional Model Inversion Attacks
Si-An Chen
Mostafa Kahla
R. Jia
Guo-Jun Qi
24
93
0
08 Oct 2020
Learning to Recombine and Resample Data for Compositional Generalization
Learning to Recombine and Resample Data for Compositional Generalization
Ekin Akyürek
Afra Feyza Akyürek
Jacob Andreas
29
79
0
08 Oct 2020
A Mathematical Exploration of Why Language Models Help Solve Downstream
  Tasks
A Mathematical Exploration of Why Language Models Help Solve Downstream Tasks
Nikunj Saunshi
Sadhika Malladi
Sanjeev Arora
22
87
0
07 Oct 2020
HeteroFL: Computation and Communication Efficient Federated Learning for
  Heterogeneous Clients
HeteroFL: Computation and Communication Efficient Federated Learning for Heterogeneous Clients
Enmao Diao
Jie Ding
Vahid Tarokh
FedML
46
546
0
03 Oct 2020
Data-Efficient Pretraining via Contrastive Self-Supervision
Data-Efficient Pretraining via Contrastive Self-Supervision
Nils Rethmeier
Isabelle Augenstein
28
20
0
02 Oct 2020
Improving Low Compute Language Modeling with In-Domain Embedding
  Initialisation
Improving Low Compute Language Modeling with In-Domain Embedding Initialisation
Charles F Welch
Rada Mihalcea
Jonathan K. Kummerfeld
AI4CE
19
4
0
29 Sep 2020
Multi-timescale Representation Learning in LSTM Language Models
Multi-timescale Representation Learning in LSTM Language Models
Shivangi Mahto
Vy A. Vo
Javier S. Turek
Alexander G. Huth
15
29
0
27 Sep 2020
Automated Source Code Generation and Auto-completion Using Deep
  Learning: Comparing and Discussing Current Language-Model-Related Approaches
Automated Source Code Generation and Auto-completion Using Deep Learning: Comparing and Discussing Current Language-Model-Related Approaches
Juan Cruz-Benito
Sanjay Vishwakarma
Francisco Martín-Fernández
Ismael Faro Ibm Quantum
22
30
0
16 Sep 2020
Efficient Transformers: A Survey
Efficient Transformers: A Survey
Yi Tay
Mostafa Dehghani
Dara Bahri
Donald Metzler
VLM
114
1,103
0
14 Sep 2020
Cluster-Former: Clustering-based Sparse Transformer for Long-Range
  Dependency Encoding
Cluster-Former: Clustering-based Sparse Transformer for Long-Range Dependency Encoding
Shuohang Wang
Luowei Zhou
Zhe Gan
Yen-Chun Chen
Yuwei Fang
S. Sun
Yu Cheng
Jingjing Liu
43
28
0
13 Sep 2020
Probabilistic Predictions of People Perusing: Evaluating Metrics of
  Language Model Performance for Psycholinguistic Modeling
Probabilistic Predictions of People Perusing: Evaluating Metrics of Language Model Performance for Psycholinguistic Modeling
Sophie Hao
S. Mendelsohn
Rachel Sterneck
Randi Martinez
Robert Frank
19
46
0
08 Sep 2020
Adversarial Watermarking Transformer: Towards Tracing Text Provenance
  with Data Hiding
Adversarial Watermarking Transformer: Towards Tracing Text Provenance with Data Hiding
Sahar Abdelnabi
Mario Fritz
WaLM
28
44
0
07 Sep 2020
Stochastic Normalized Gradient Descent with Momentum for Large-Batch
  Training
Stochastic Normalized Gradient Descent with Momentum for Large-Batch Training
Shen-Yi Zhao
Chang-Wei Shi
Yin-Peng Xie
Wu-Jun Li
ODL
26
8
0
28 Jul 2020
FTRANS: Energy-Efficient Acceleration of Transformers using FPGA
FTRANS: Energy-Efficient Acceleration of Transformers using FPGA
Bingbing Li
Santosh Pandey
Haowen Fang
Yanjun Lyv
Ji Li
Jieyang Chen
Mimi Xie
Lipeng Wan
Hang Liu
Caiwen Ding
AI4CE
16
170
0
16 Jul 2020
Hopfield Networks is All You Need
Hopfield Networks is All You Need
Hubert Ramsauer
Bernhard Schafl
Johannes Lehner
Philipp Seidl
Michael Widrich
...
David P. Kreil
Michael K Kopp
Günter Klambauer
Johannes Brandstetter
Sepp Hochreiter
24
415
0
16 Jul 2020
A Survey of Privacy Attacks in Machine Learning
A Survey of Privacy Attacks in Machine Learning
M. Rigaki
Sebastian Garcia
PILM
AAML
39
213
0
15 Jul 2020
Term Revealing: Furthering Quantization at Run Time on Quantized DNNs
Term Revealing: Furthering Quantization at Run Time on Quantized DNNs
H. T. Kung
Bradley McDanel
S. Zhang
MQ
21
9
0
13 Jul 2020
Climbing the WOL: Training for Cheaper Inference
Climbing the WOL: Training for Cheaper Inference
Zichang Liu
Zhaozhuo Xu
A. Ji
Jonathan Li
Beidi Chen
Anshumali Shrivastava
TPM
24
7
0
02 Jul 2020
Direct Feedback Alignment Scales to Modern Deep Learning Tasks and
  Architectures
Direct Feedback Alignment Scales to Modern Deep Learning Tasks and Architectures
Julien Launay
Iacopo Poli
Franccois Boniface
Florent Krzakala
41
63
0
23 Jun 2020
Extension of Direct Feedback Alignment to Convolutional and Recurrent
  Neural Network for Bio-plausible Deep Learning
Extension of Direct Feedback Alignment to Convolutional and Recurrent Neural Network for Bio-plausible Deep Learning
Donghyeon Han
Gwangtae Park
Junha Ryu
H. Yoo
3DV
15
5
0
23 Jun 2020
The Depth-to-Width Interplay in Self-Attention
The Depth-to-Width Interplay in Self-Attention
Yoav Levine
Noam Wies
Or Sharir
Hofit Bata
Amnon Shashua
30
45
0
22 Jun 2020
Learning to Prove from Synthetic Theorems
Learning to Prove from Synthetic Theorems
Eser Aygun
Zafarali Ahmed
Ankit Anand
Vlad Firoiu
Xavier Glorot
Laurent Orseau
Doina Precup
Shibl Mourad
NAI
20
20
0
19 Jun 2020
Categorical Normalizing Flows via Continuous Transformations
Categorical Normalizing Flows via Continuous Transformations
Phillip Lippe
E. Gavves
BDL
21
43
0
17 Jun 2020
NAS-Bench-NLP: Neural Architecture Search Benchmark for Natural Language
  Processing
NAS-Bench-NLP: Neural Architecture Search Benchmark for Natural Language Processing
Nikita Klyuchnikov
I. Trofimov
Ekaterina Artemova
Mikhail Salnikov
M. Fedorov
Evgeny Burnaev
VLM
21
101
0
12 Jun 2020
On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and
  Strong Baselines
On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines
Marius Mosbach
Maksym Andriushchenko
Dietrich Klakow
31
352
0
08 Jun 2020
Copy that! Editing Sequences by Copying Spans
Copy that! Editing Sequences by Copying Spans
Sheena Panthaplackel
Miltiadis Allamanis
Marc Brockschmidt
BDL
21
28
0
08 Jun 2020
Linformer: Self-Attention with Linear Complexity
Linformer: Self-Attention with Linear Complexity
Sinong Wang
Belinda Z. Li
Madian Khabsa
Han Fang
Hao Ma
66
1,651
0
08 Jun 2020
DeBERTa: Decoding-enhanced BERT with Disentangled Attention
DeBERTa: Decoding-enhanced BERT with Disentangled Attention
Pengcheng He
Xiaodong Liu
Jianfeng Gao
Weizhu Chen
AAML
64
2,626
0
05 Jun 2020
ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning
ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning
Z. Yao
A. Gholami
Sheng Shen
Mustafa Mustafa
Kurt Keutzer
Michael W. Mahoney
ODL
39
275
0
01 Jun 2020
Contextualizing ASR Lattice Rescoring with Hybrid Pointer Network
  Language Model
Contextualizing ASR Lattice Rescoring with Hybrid Pointer Network Language Model
Da-Rong Liu
Chunxi Liu
Frank Zhang
Gabriel Synnaeve
Yatharth Saraf
Geoffrey Zweig
28
19
0
15 May 2020
A Tale of Two Perplexities: Sensitivity of Neural Language Models to
  Lexical Retrieval Deficits in Dementia of the Alzheimer's Type
A Tale of Two Perplexities: Sensitivity of Neural Language Models to Lexical Retrieval Deficits in Dementia of the Alzheimer's Type
T. Cohen
Serguei V. S. Pakhomov
22
25
0
07 May 2020
DQI: Measuring Data Quality in NLP
DQI: Measuring Data Quality in NLP
Swaroop Mishra
Anjana Arunkumar
Bhavdeep Singh Sachdeva
Chris Bryan
Chitta Baral
36
30
0
02 May 2020
BERT-kNN: Adding a kNN Search Component to Pretrained Language Models
  for Better QA
BERT-kNN: Adding a kNN Search Component to Pretrained Language Models for Better QA
Nora Kassner
Hinrich Schütze
RALM
21
68
0
02 May 2020
Dynamic Sampling and Selective Masking for Communication-Efficient
  Federated Learning
Dynamic Sampling and Selective Masking for Communication-Efficient Federated Learning
Shaoxiong Ji
Wenqi Jiang
A. Walid
Xue Li
FedML
28
66
0
21 Mar 2020
Learning to Encode Position for Transformer with Continuous Dynamical
  Model
Learning to Encode Position for Transformer with Continuous Dynamical Model
Xuanqing Liu
Hsiang-Fu Yu
Inderjit Dhillon
Cho-Jui Hsieh
16
107
0
13 Mar 2020
ReZero is All You Need: Fast Convergence at Large Depth
ReZero is All You Need: Fast Convergence at Large Depth
Thomas C. Bachlechner
Bodhisattwa Prasad Majumder
H. H. Mao
G. Cottrell
Julian McAuley
AI4CE
30
276
0
10 Mar 2020
Temporal Convolutional Attention-based Network For Sequence Modeling
Temporal Convolutional Attention-based Network For Sequence Modeling
Hongyan Hao
Yan Wang
Siqiao Xue
Yudi Xia
Jian Zhao
S. Furao
30
41
0
28 Feb 2020
Statistical Adaptive Stochastic Gradient Methods
Statistical Adaptive Stochastic Gradient Methods
Pengchuan Zhang
Hunter Lang
Qiang Liu
Lin Xiao
ODL
15
11
0
25 Feb 2020
Limits of Detecting Text Generated by Large-Scale Language Models
Limits of Detecting Text Generated by Large-Scale Language Models
L. Varshney
N. Keskar
R. Socher
DeLMO
21
18
0
09 Feb 2020
On the distance between two neural networks and the stability of
  learning
On the distance between two neural networks and the stability of learning
Jeremy Bernstein
Arash Vahdat
Yisong Yue
Xuan Li
ODL
200
57
0
09 Feb 2020
Consistency of a Recurrent Language Model With Respect to Incomplete
  Decoding
Consistency of a Recurrent Language Model With Respect to Incomplete Decoding
Sean Welleck
Ilia Kulikov
Jaedeok Kim
Richard Yuanzhe Pang
Kyunghyun Cho
17
65
0
06 Feb 2020
Single Headed Attention RNN: Stop Thinking With Your Head
Single Headed Attention RNN: Stop Thinking With Your Head
Stephen Merity
27
68
0
26 Nov 2019
Previous
123...11121314
Next