Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1609.07843
Cited By
Pointer Sentinel Mixture Models
26 September 2016
Stephen Merity
Caiming Xiong
James Bradbury
R. Socher
RALM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Pointer Sentinel Mixture Models"
50 / 702 papers shown
Title
Accelerating Retrieval-Augmented Language Model Serving with Speculation
Zhihao Zhang
Alan Zhu
Lijie Yang
Yihua Xu
Lanting Li
P. Phothilimthana
Zhihao Jia
RALM
KELM
56
16
0
25 Jan 2024
MADA: Meta-Adaptive Optimizers through hyper-gradient Descent
Kaan Ozkara
Can Karakus
Parameswaran Raman
Mingyi Hong
Shoham Sabach
B. Kveton
V. Cevher
30
2
0
17 Jan 2024
Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models
Asma Ghandeharioun
Avi Caciularu
Adam Pearce
Lucas Dixon
Mor Geva
37
90
0
11 Jan 2024
HAP: SPMD DNN Training on Heterogeneous GPU Clusters with Automated Program Synthesis
Shiwei Zhang
Lansong Diao
Chuan Wu
Zongyan Cao
Siyu Wang
Wei Lin
43
12
0
11 Jan 2024
The LLM Surgeon
Tycho F. A. van der Ouderaa
Markus Nagel
M. V. Baalen
Yuki Markus Asano
Tijmen Blankevoort
39
14
0
28 Dec 2023
PERP: Rethinking the Prune-Retrain Paradigm in the Era of LLMs
Max Zimmer
Megi Andoni
Christoph Spiegel
Sebastian Pokutta
VLM
55
10
0
23 Dec 2023
Fluctuation-based Adaptive Structured Pruning for Large Language Models
Yongqi An
Xu Zhao
Tao Yu
Ming Tang
Jinqiao Wang
39
42
0
19 Dec 2023
IPAD: Iterative, Parallel, and Diffusion-based Network for Scene Text Recognition
Xiaomeng Yang
Zhi Qiao
Yu Zhou
DiffM
62
1
0
19 Dec 2023
Delving Deeper Into Astromorphic Transformers
Md. Zesun Ahmed Mia
Malyaban Bal
Abhronil Sengupta
36
1
0
18 Dec 2023
CBQ: Cross-Block Quantization for Large Language Models
Xin Ding
Xiaoyu Liu
Zhijun Tu
Yun-feng Zhang
Wei Li
...
Hanting Chen
Yehui Tang
Zhiwei Xiong
Baoqun Yin
Yunhe Wang
MQ
38
13
0
13 Dec 2023
Agile-Quant: Activation-Guided Quantization for Faster Inference of LLMs on the Edge
Xuan Shen
Peiyan Dong
Lei Lu
Zhenglun Kong
Zhengang Li
Ming Lin
Chao Wu
Yanzhi Wang
MQ
52
25
0
09 Dec 2023
Graph Convolutions Enrich the Self-Attention in Transformers!
Jeongwhan Choi
Hyowon Wi
Jayoung Kim
Yehjin Shin
Kookjin Lee
Nathaniel Trask
Noseong Park
40
4
0
07 Dec 2023
Run LoRA Run: Faster and Lighter LoRA Implementations
Daria Cherniuk
A. Mikhalev
Ivan Oseledets
AI4CE
22
1
0
06 Dec 2023
FlexModel: A Framework for Interpretability of Distributed Large Language Models
Matthew Choi
Muhammad Adil Asif
John Willes
David Emerson
AI4CE
ALM
30
1
0
05 Dec 2023
Revisiting Topic-Guided Language Models
Carolina Zheng
Keyon Vafa
David M. Blei
BDL
35
1
0
04 Dec 2023
Power Hungry Processing: Watts Driving the Cost of AI Deployment?
Sasha Luccioni
Yacine Jernite
Emma Strubell
44
163
0
28 Nov 2023
StableSSM: Alleviating the Curse of Memory in State-space Models through Stable Reparameterization
Shida Wang
Qianxiao Li
22
13
0
24 Nov 2023
LQ-LoRA: Low-rank Plus Quantized Matrix Decomposition for Efficient Language Model Finetuning
Han Guo
P. Greengard
Eric P. Xing
Yoon Kim
MQ
38
44
0
20 Nov 2023
Beyond Boundaries: A Comprehensive Survey of Transferable Attacks on AI Systems
Guangjing Wang
Ce Zhou
Yuanda Wang
Bocheng Chen
Hanqing Guo
Qiben Yan
AAML
SILM
68
3
0
20 Nov 2023
Dual input stream transformer for vertical drift correction in eye-tracking reading data
Thomas M. Mercier
Marcin Budka
Martin R. Vasilev
Julie A. Kirkby
Bernhard Angele
T. Slattery
37
3
0
10 Nov 2023
TorchDEQ: A Library for Deep Equilibrium Models
Zhengyang Geng
J. Zico Kolter
VLM
62
12
0
28 Oct 2023
Codebook Features: Sparse and Discrete Interpretability for Neural Networks
Alex Tamkin
Mohammad Taufeeque
Noah D. Goodman
35
27
0
26 Oct 2023
FLTrojan: Privacy Leakage Attacks against Federated Language Models Through Selective Weight Tampering
Md. Rafi Ur Rashid
Vishnu Asutosh Dasu
Kang Gu
Najrin Sultana
Shagufta Mehnaz
AAML
FedML
46
10
0
24 Oct 2023
Bridging Information-Theoretic and Geometric Compression in Language Models
Emily Cheng
Corentin Kervadec
Marco Baroni
36
17
0
20 Oct 2023
Model Merging by Uncertainty-Based Gradient Matching
Nico Daheim
Thomas Möllenhoff
Edoardo Ponti
Iryna Gurevych
Mohammad Emtiyaz Khan
MoMe
FedML
32
45
0
19 Oct 2023
Identifying and Adapting Transformer-Components Responsible for Gender Bias in an English Language Model
Abhijith Chintam
Rahel Beloch
Willem H. Zuidema
Michael Hanna
Oskar van der Wal
28
16
0
19 Oct 2023
Disentangling the Linguistic Competence of Privacy-Preserving BERT
Stefan Arnold
Nils Kemmerzell
Annika Schreiner
35
0
0
17 Oct 2023
Dynamic Sparse No Training: Training-Free Fine-tuning for Sparse LLMs
Yuxin Zhang
Lirui Zhao
Mingbao Lin
Yunyun Sun
Yiwu Yao
Xingjia Han
Jared Tanner
Shiwei Liu
Rongrong Ji
SyDa
45
40
0
13 Oct 2023
CCAE: A Corpus of Chinese-based Asian Englishes
Yang Liu
Melissa Xiaohui Qin
Long Wang
Chao-Wei Huang
28
0
0
09 Oct 2023
Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity
Lu Yin
You Wu
Zhenyu Zhang
Cheng-Yu Hsieh
Yaqing Wang
...
Mykola Pechenizkiy
Yi Liang
Michael Bendersky
Zhangyang Wang
Shiwei Liu
36
79
0
08 Oct 2023
Pre-training with Synthetic Data Helps Offline Reinforcement Learning
Zecheng Wang
Che Wang
Zixuan Dong
Keith Ross
OffRL
38
5
0
01 Oct 2023
Augmenting Transformers with Recursively Composed Multi-grained Representations
Xiang Hu
Qingyang Zhu
Kewei Tu
Wei Wu
34
3
0
28 Sep 2023
Forgetting Private Textual Sequences in Language Models via Leave-One-Out Ensemble
Zhe Liu
Ozlem Kalinli
MU
KELM
28
2
0
28 Sep 2023
Learning to Diversify Neural Text Generation via Degenerative Model
Jimin Hong
chaeHun Park
Jaegul Choo
34
0
0
22 Sep 2023
Recovering from Privacy-Preserving Masking with Large Language Models
A. Vats
Zhe Liu
Peng Su
Debjyoti Paul
Yingyi Ma
Yutong Pang
Zeeshan Ahmed
Ozlem Kalinli
31
9
0
12 Sep 2023
Norm Tweaking: High-performance Low-bit Quantization of Large Language Models
Liang Li
Qingyuan Li
Bo-Wen Zhang
Xiangxiang Chu
MQ
47
29
0
06 Sep 2023
How to Protect Copyright Data in Optimization of Large Language Models?
T. Chu
Zhao Song
Chiwun Yang
45
29
0
23 Aug 2023
Advancing Natural-Language Based Audio Retrieval with PaSST and Large Audio-Caption Data Sets
Paul Primus
Khaled Koutini
Gerhard Widmer
32
13
0
08 Aug 2023
Accurate Retraining-free Pruning for Pretrained Encoder-based Language Models
Seungcheol Park
Ho-Jin Choi
U. Kang
VLM
42
5
0
07 Aug 2023
Advancing Beyond Identification: Multi-bit Watermark for Large Language Models
Kiyoon Yoo
Wonhyuk Ahn
Nojun Kwak
WaLM
35
17
0
01 Aug 2023
QuIP: 2-Bit Quantization of Large Language Models With Guarantees
Jerry Chee
Yaohui Cai
Volodymyr Kuleshov
Chris De Sa
MQ
51
189
0
25 Jul 2023
What can we learn from Data Leakage and Unlearning for Law?
Jaydeep Borkar
PILM
MU
38
10
0
19 Jul 2023
DP-TBART: A Transformer-based Autoregressive Model for Differentially Private Tabular Data Generation
Rodrigo Castellon
Achintya Gopal
Brian Bloniarz
David S. Rosenberg
26
8
0
19 Jul 2023
Accelerating Distributed ML Training via Selective Synchronization
S. Tyagi
Martin Swany
FedML
41
3
0
16 Jul 2023
FedBIAD: Communication-Efficient and Accuracy-Guaranteed Federated Learning with Bayesian Inference-Based Adaptive Dropout
Jingjing Xue
Min Liu
Sheng Sun
Yuwei Wang
Hui Jiang
Xue Jiang
21
7
0
14 Jul 2023
QIGen: Generating Efficient Kernels for Quantized Inference on Large Language Models
Tommaso Pegolotti
Elias Frantar
Dan Alistarh
Markus Püschel
MQ
24
3
0
07 Jul 2023
Prompt Tuning Pushes Farther, Contrastive Learning Pulls Closer: A Two-Stage Approach to Mitigate Social Biases
Yingji Li
Mengnan Du
Xin Wang
Ying Wang
53
27
0
04 Jul 2023
Mitigating the Learning Bias towards Repetition by Self-Contrastive Training for Open-Ended Generation
Jian Guan
Minlie Huang
32
0
0
04 Jul 2023
Personality Traits in Large Language Models
Gregory Serapio-García
Mustafa Safdari
Clément Crepy
Luning Sun
Stephen Fitz
P. Romero
Marwa Abdulhai
Aleksandra Faust
Maja J. Matarić
LM&MA
LLMAG
58
119
0
01 Jul 2023
Nonconvex Stochastic Bregman Proximal Gradient Method with Application to Deep Learning
Kuan-Fu Ding
Jingyang Li
Kim-Chuan Toh
33
8
0
26 Jun 2023
Previous
1
2
3
...
5
6
7
...
13
14
15
Next