ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1609.07843
  4. Cited By
Pointer Sentinel Mixture Models

Pointer Sentinel Mixture Models

26 September 2016
Stephen Merity
Caiming Xiong
James Bradbury
R. Socher
    RALM
ArXivPDFHTML

Papers citing "Pointer Sentinel Mixture Models"

50 / 702 papers shown
Title
Accelerating Retrieval-Augmented Language Model Serving with Speculation
Accelerating Retrieval-Augmented Language Model Serving with Speculation
Zhihao Zhang
Alan Zhu
Lijie Yang
Yihua Xu
Lanting Li
P. Phothilimthana
Zhihao Jia
RALM
KELM
56
16
0
25 Jan 2024
MADA: Meta-Adaptive Optimizers through hyper-gradient Descent
MADA: Meta-Adaptive Optimizers through hyper-gradient Descent
Kaan Ozkara
Can Karakus
Parameswaran Raman
Mingyi Hong
Shoham Sabach
B. Kveton
V. Cevher
30
2
0
17 Jan 2024
Patchscopes: A Unifying Framework for Inspecting Hidden Representations
  of Language Models
Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models
Asma Ghandeharioun
Avi Caciularu
Adam Pearce
Lucas Dixon
Mor Geva
37
90
0
11 Jan 2024
HAP: SPMD DNN Training on Heterogeneous GPU Clusters with Automated
  Program Synthesis
HAP: SPMD DNN Training on Heterogeneous GPU Clusters with Automated Program Synthesis
Shiwei Zhang
Lansong Diao
Chuan Wu
Zongyan Cao
Siyu Wang
Wei Lin
43
12
0
11 Jan 2024
The LLM Surgeon
The LLM Surgeon
Tycho F. A. van der Ouderaa
Markus Nagel
M. V. Baalen
Yuki Markus Asano
Tijmen Blankevoort
39
14
0
28 Dec 2023
PERP: Rethinking the Prune-Retrain Paradigm in the Era of LLMs
PERP: Rethinking the Prune-Retrain Paradigm in the Era of LLMs
Max Zimmer
Megi Andoni
Christoph Spiegel
Sebastian Pokutta
VLM
55
10
0
23 Dec 2023
Fluctuation-based Adaptive Structured Pruning for Large Language Models
Fluctuation-based Adaptive Structured Pruning for Large Language Models
Yongqi An
Xu Zhao
Tao Yu
Ming Tang
Jinqiao Wang
39
42
0
19 Dec 2023
IPAD: Iterative, Parallel, and Diffusion-based Network for Scene Text Recognition
IPAD: Iterative, Parallel, and Diffusion-based Network for Scene Text Recognition
Xiaomeng Yang
Zhi Qiao
Yu Zhou
DiffM
62
1
0
19 Dec 2023
Delving Deeper Into Astromorphic Transformers
Delving Deeper Into Astromorphic Transformers
Md. Zesun Ahmed Mia
Malyaban Bal
Abhronil Sengupta
36
1
0
18 Dec 2023
CBQ: Cross-Block Quantization for Large Language Models
CBQ: Cross-Block Quantization for Large Language Models
Xin Ding
Xiaoyu Liu
Zhijun Tu
Yun-feng Zhang
Wei Li
...
Hanting Chen
Yehui Tang
Zhiwei Xiong
Baoqun Yin
Yunhe Wang
MQ
38
13
0
13 Dec 2023
Agile-Quant: Activation-Guided Quantization for Faster Inference of LLMs on the Edge
Agile-Quant: Activation-Guided Quantization for Faster Inference of LLMs on the Edge
Xuan Shen
Peiyan Dong
Lei Lu
Zhenglun Kong
Zhengang Li
Ming Lin
Chao Wu
Yanzhi Wang
MQ
52
25
0
09 Dec 2023
Graph Convolutions Enrich the Self-Attention in Transformers!
Graph Convolutions Enrich the Self-Attention in Transformers!
Jeongwhan Choi
Hyowon Wi
Jayoung Kim
Yehjin Shin
Kookjin Lee
Nathaniel Trask
Noseong Park
40
4
0
07 Dec 2023
Run LoRA Run: Faster and Lighter LoRA Implementations
Run LoRA Run: Faster and Lighter LoRA Implementations
Daria Cherniuk
A. Mikhalev
Ivan Oseledets
AI4CE
22
1
0
06 Dec 2023
FlexModel: A Framework for Interpretability of Distributed Large
  Language Models
FlexModel: A Framework for Interpretability of Distributed Large Language Models
Matthew Choi
Muhammad Adil Asif
John Willes
David Emerson
AI4CE
ALM
30
1
0
05 Dec 2023
Revisiting Topic-Guided Language Models
Revisiting Topic-Guided Language Models
Carolina Zheng
Keyon Vafa
David M. Blei
BDL
35
1
0
04 Dec 2023
Power Hungry Processing: Watts Driving the Cost of AI Deployment?
Power Hungry Processing: Watts Driving the Cost of AI Deployment?
Sasha Luccioni
Yacine Jernite
Emma Strubell
44
163
0
28 Nov 2023
StableSSM: Alleviating the Curse of Memory in State-space Models through
  Stable Reparameterization
StableSSM: Alleviating the Curse of Memory in State-space Models through Stable Reparameterization
Shida Wang
Qianxiao Li
22
13
0
24 Nov 2023
LQ-LoRA: Low-rank Plus Quantized Matrix Decomposition for Efficient
  Language Model Finetuning
LQ-LoRA: Low-rank Plus Quantized Matrix Decomposition for Efficient Language Model Finetuning
Han Guo
P. Greengard
Eric P. Xing
Yoon Kim
MQ
38
44
0
20 Nov 2023
Beyond Boundaries: A Comprehensive Survey of Transferable Attacks on AI Systems
Beyond Boundaries: A Comprehensive Survey of Transferable Attacks on AI Systems
Guangjing Wang
Ce Zhou
Yuanda Wang
Bocheng Chen
Hanqing Guo
Qiben Yan
AAML
SILM
68
3
0
20 Nov 2023
Dual input stream transformer for vertical drift correction in
  eye-tracking reading data
Dual input stream transformer for vertical drift correction in eye-tracking reading data
Thomas M. Mercier
Marcin Budka
Martin R. Vasilev
Julie A. Kirkby
Bernhard Angele
T. Slattery
37
3
0
10 Nov 2023
TorchDEQ: A Library for Deep Equilibrium Models
TorchDEQ: A Library for Deep Equilibrium Models
Zhengyang Geng
J. Zico Kolter
VLM
62
12
0
28 Oct 2023
Codebook Features: Sparse and Discrete Interpretability for Neural
  Networks
Codebook Features: Sparse and Discrete Interpretability for Neural Networks
Alex Tamkin
Mohammad Taufeeque
Noah D. Goodman
35
27
0
26 Oct 2023
FLTrojan: Privacy Leakage Attacks against Federated Language Models Through Selective Weight Tampering
FLTrojan: Privacy Leakage Attacks against Federated Language Models Through Selective Weight Tampering
Md. Rafi Ur Rashid
Vishnu Asutosh Dasu
Kang Gu
Najrin Sultana
Shagufta Mehnaz
AAML
FedML
46
10
0
24 Oct 2023
Bridging Information-Theoretic and Geometric Compression in Language
  Models
Bridging Information-Theoretic and Geometric Compression in Language Models
Emily Cheng
Corentin Kervadec
Marco Baroni
36
17
0
20 Oct 2023
Model Merging by Uncertainty-Based Gradient Matching
Model Merging by Uncertainty-Based Gradient Matching
Nico Daheim
Thomas Möllenhoff
Edoardo Ponti
Iryna Gurevych
Mohammad Emtiyaz Khan
MoMe
FedML
32
45
0
19 Oct 2023
Identifying and Adapting Transformer-Components Responsible for Gender
  Bias in an English Language Model
Identifying and Adapting Transformer-Components Responsible for Gender Bias in an English Language Model
Abhijith Chintam
Rahel Beloch
Willem H. Zuidema
Michael Hanna
Oskar van der Wal
28
16
0
19 Oct 2023
Disentangling the Linguistic Competence of Privacy-Preserving BERT
Disentangling the Linguistic Competence of Privacy-Preserving BERT
Stefan Arnold
Nils Kemmerzell
Annika Schreiner
35
0
0
17 Oct 2023
Dynamic Sparse No Training: Training-Free Fine-tuning for Sparse LLMs
Dynamic Sparse No Training: Training-Free Fine-tuning for Sparse LLMs
Yuxin Zhang
Lirui Zhao
Mingbao Lin
Yunyun Sun
Yiwu Yao
Xingjia Han
Jared Tanner
Shiwei Liu
Rongrong Ji
SyDa
45
40
0
13 Oct 2023
CCAE: A Corpus of Chinese-based Asian Englishes
CCAE: A Corpus of Chinese-based Asian Englishes
Yang Liu
Melissa Xiaohui Qin
Long Wang
Chao-Wei Huang
28
0
0
09 Oct 2023
Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for
  Pruning LLMs to High Sparsity
Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity
Lu Yin
You Wu
Zhenyu Zhang
Cheng-Yu Hsieh
Yaqing Wang
...
Mykola Pechenizkiy
Yi Liang
Michael Bendersky
Zhangyang Wang
Shiwei Liu
36
79
0
08 Oct 2023
Pre-training with Synthetic Data Helps Offline Reinforcement Learning
Pre-training with Synthetic Data Helps Offline Reinforcement Learning
Zecheng Wang
Che Wang
Zixuan Dong
Keith Ross
OffRL
38
5
0
01 Oct 2023
Augmenting Transformers with Recursively Composed Multi-grained
  Representations
Augmenting Transformers with Recursively Composed Multi-grained Representations
Xiang Hu
Qingyang Zhu
Kewei Tu
Wei Wu
34
3
0
28 Sep 2023
Forgetting Private Textual Sequences in Language Models via
  Leave-One-Out Ensemble
Forgetting Private Textual Sequences in Language Models via Leave-One-Out Ensemble
Zhe Liu
Ozlem Kalinli
MU
KELM
28
2
0
28 Sep 2023
Learning to Diversify Neural Text Generation via Degenerative Model
Learning to Diversify Neural Text Generation via Degenerative Model
Jimin Hong
chaeHun Park
Jaegul Choo
34
0
0
22 Sep 2023
Recovering from Privacy-Preserving Masking with Large Language Models
Recovering from Privacy-Preserving Masking with Large Language Models
A. Vats
Zhe Liu
Peng Su
Debjyoti Paul
Yingyi Ma
Yutong Pang
Zeeshan Ahmed
Ozlem Kalinli
31
9
0
12 Sep 2023
Norm Tweaking: High-performance Low-bit Quantization of Large Language
  Models
Norm Tweaking: High-performance Low-bit Quantization of Large Language Models
Liang Li
Qingyuan Li
Bo-Wen Zhang
Xiangxiang Chu
MQ
47
29
0
06 Sep 2023
How to Protect Copyright Data in Optimization of Large Language Models?
How to Protect Copyright Data in Optimization of Large Language Models?
T. Chu
Zhao Song
Chiwun Yang
45
29
0
23 Aug 2023
Advancing Natural-Language Based Audio Retrieval with PaSST and Large
  Audio-Caption Data Sets
Advancing Natural-Language Based Audio Retrieval with PaSST and Large Audio-Caption Data Sets
Paul Primus
Khaled Koutini
Gerhard Widmer
32
13
0
08 Aug 2023
Accurate Retraining-free Pruning for Pretrained Encoder-based Language
  Models
Accurate Retraining-free Pruning for Pretrained Encoder-based Language Models
Seungcheol Park
Ho-Jin Choi
U. Kang
VLM
42
5
0
07 Aug 2023
Advancing Beyond Identification: Multi-bit Watermark for Large Language
  Models
Advancing Beyond Identification: Multi-bit Watermark for Large Language Models
Kiyoon Yoo
Wonhyuk Ahn
Nojun Kwak
WaLM
35
17
0
01 Aug 2023
QuIP: 2-Bit Quantization of Large Language Models With Guarantees
QuIP: 2-Bit Quantization of Large Language Models With Guarantees
Jerry Chee
Yaohui Cai
Volodymyr Kuleshov
Chris De Sa
MQ
51
189
0
25 Jul 2023
What can we learn from Data Leakage and Unlearning for Law?
What can we learn from Data Leakage and Unlearning for Law?
Jaydeep Borkar
PILM
MU
38
10
0
19 Jul 2023
DP-TBART: A Transformer-based Autoregressive Model for Differentially
  Private Tabular Data Generation
DP-TBART: A Transformer-based Autoregressive Model for Differentially Private Tabular Data Generation
Rodrigo Castellon
Achintya Gopal
Brian Bloniarz
David S. Rosenberg
26
8
0
19 Jul 2023
Accelerating Distributed ML Training via Selective Synchronization
Accelerating Distributed ML Training via Selective Synchronization
S. Tyagi
Martin Swany
FedML
41
3
0
16 Jul 2023
FedBIAD: Communication-Efficient and Accuracy-Guaranteed Federated
  Learning with Bayesian Inference-Based Adaptive Dropout
FedBIAD: Communication-Efficient and Accuracy-Guaranteed Federated Learning with Bayesian Inference-Based Adaptive Dropout
Jingjing Xue
Min Liu
Sheng Sun
Yuwei Wang
Hui Jiang
Xue Jiang
21
7
0
14 Jul 2023
QIGen: Generating Efficient Kernels for Quantized Inference on Large
  Language Models
QIGen: Generating Efficient Kernels for Quantized Inference on Large Language Models
Tommaso Pegolotti
Elias Frantar
Dan Alistarh
Markus Püschel
MQ
24
3
0
07 Jul 2023
Prompt Tuning Pushes Farther, Contrastive Learning Pulls Closer: A
  Two-Stage Approach to Mitigate Social Biases
Prompt Tuning Pushes Farther, Contrastive Learning Pulls Closer: A Two-Stage Approach to Mitigate Social Biases
Yingji Li
Mengnan Du
Xin Wang
Ying Wang
53
27
0
04 Jul 2023
Mitigating the Learning Bias towards Repetition by Self-Contrastive
  Training for Open-Ended Generation
Mitigating the Learning Bias towards Repetition by Self-Contrastive Training for Open-Ended Generation
Jian Guan
Minlie Huang
32
0
0
04 Jul 2023
Personality Traits in Large Language Models
Personality Traits in Large Language Models
Gregory Serapio-García
Mustafa Safdari
Clément Crepy
Luning Sun
Stephen Fitz
P. Romero
Marwa Abdulhai
Aleksandra Faust
Maja J. Matarić
LM&MA
LLMAG
58
119
0
01 Jul 2023
Nonconvex Stochastic Bregman Proximal Gradient Method with Application to Deep Learning
Nonconvex Stochastic Bregman Proximal Gradient Method with Application to Deep Learning
Kuan-Fu Ding
Jingyang Li
Kim-Chuan Toh
33
8
0
26 Jun 2023
Previous
123...567...131415
Next