ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1609.07843
  4. Cited By
Pointer Sentinel Mixture Models

Pointer Sentinel Mixture Models

26 September 2016
Stephen Merity
Caiming Xiong
James Bradbury
R. Socher
    RALM
ArXivPDFHTML

Papers citing "Pointer Sentinel Mixture Models"

50 / 696 papers shown
Title
LUSIFER: Language Universal Space Integration for Enhanced Multilingual Embeddings with Large Language Models
LUSIFER: Language Universal Space Integration for Enhanced Multilingual Embeddings with Large Language Models
Hieu Man
Nghia Trung Ngo
Viet Dac Lai
Ryan Rossi
Franck Dernoncourt
T. Nguyen
220
0
0
01 Jan 2025
LSAQ: Layer-Specific Adaptive Quantization for Large Language Model Deployment
LSAQ: Layer-Specific Adaptive Quantization for Large Language Model Deployment
Binrui Zeng
Bin Ji
Xiaodong Liu
Jie Yu
Shasha Li
Jun Ma
Xiaopeng Li
Shangwen Wang
Xinran Hong
Yongtao Tang
MQ
42
1
0
24 Dec 2024
GQSA: Group Quantization and Sparsity for Accelerating Large Language Model Inference
GQSA: Group Quantization and Sparsity for Accelerating Large Language Model Inference
Chao Zeng
Songwei Liu
Shu Yang
Fangmin Chen
Xing Mei
Lean Fu
MQ
44
0
0
23 Dec 2024
Context Clues: Evaluating Long Context Models for Clinical Prediction Tasks on EHRs
Context Clues: Evaluating Long Context Models for Clinical Prediction Tasks on EHRs
Michael Wornow
Suhana Bedi
Miguel Angel Fuentes Hernandez
E. Steinberg
Jason Alan Fries
Christopher Ré
Sanmi Koyejo
N. Shah
100
4
0
09 Dec 2024
Taming Sensitive Weights : Noise Perturbation Fine-tuning for Robust LLM Quantization
Taming Sensitive Weights : Noise Perturbation Fine-tuning for Robust LLM Quantization
Dongwei Wang
Huanrui Yang
MQ
90
1
0
08 Dec 2024
Unifying KV Cache Compression for Large Language Models with LeanKV
Unifying KV Cache Compression for Large Language Models with LeanKV
Yanqi Zhang
Yuwei Hu
Runyuan Zhao
John C. S. Lui
Haibo Chen
MQ
151
5
0
04 Dec 2024
Sneaking Syntax into Transformer Language Models with Tree Regularization
Sneaking Syntax into Transformer Language Models with Tree Regularization
Ananjan Nandi
Christopher D. Manning
Shikhar Murty
74
0
0
28 Nov 2024
BitMoD: Bit-serial Mixture-of-Datatype LLM Acceleration
Yuzong Chen
Ahmed F. AbouElhamayed
Xilai Dai
Yang Wang
Marta Andronic
George A. Constantinides
Mohamed S. Abdelfattah
MQ
110
1
0
18 Nov 2024
Zeroth-Order Adaptive Neuron Alignment Based Pruning without Re-Training
Zeroth-Order Adaptive Neuron Alignment Based Pruning without Re-Training
Elia Cunegatti
Leonardo Lucio Custode
Giovanni Iacca
52
0
0
11 Nov 2024
Energy-Based Diffusion Language Models for Text Generation
Energy-Based Diffusion Language Models for Text Generation
Minkai Xu
Tomas Geffner
Karsten Kreis
Weili Nie
Yilun Xu
J. Leskovec
Stefano Ermon
Arash Vahdat
DiffM
57
7
0
28 Oct 2024
emg2qwerty: A Large Dataset with Baselines for Touch Typing using Surface Electromyography
emg2qwerty: A Large Dataset with Baselines for Touch Typing using Surface Electromyography
Viswanath Sivakumar
Jeffrey Seely
Alan Du
Sean R Bittner
Adam Berenzweig
Anuoluwapo Bolarinwa
Alexandre Gramfort
Michael I Mandel
23
4
0
26 Oct 2024
COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training
COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training
Haocheng Xi
Han Cai
Ligeng Zhu
Yaojie Lu
Kurt Keutzer
Jianfei Chen
Song Han
MQ
75
9
0
25 Oct 2024
Towards Better Open-Ended Text Generation: A Multicriteria Evaluation Framework
Towards Better Open-Ended Text Generation: A Multicriteria Evaluation Framework
Esteban Garces Arias
Hannah Blocher
Julian Rodemann
Meimingwei Li
Christian Heumann
Matthias Aßenmacher
28
1
0
24 Oct 2024
ZIP-FIT: Embedding-Free Data Selection via Compression-Based Alignment
ZIP-FIT: Embedding-Free Data Selection via Compression-Based Alignment
Elyas Obbad
Iddah Mlauzi
Brando Miranda
Rylan Schaeffer
Kamal Obbad
Suhana Bedi
Sanmi Koyejo
CVBM
62
0
0
23 Oct 2024
WAGLE: Strategic Weight Attribution for Effective and Modular Unlearning in Large Language Models
WAGLE: Strategic Weight Attribution for Effective and Modular Unlearning in Large Language Models
Jinghan Jia
Jiancheng Liu
Yihua Zhang
Parikshit Ram
Nathalie Baracaldo
Sijia Liu
MU
40
2
0
23 Oct 2024
Tethering Broken Themes: Aligning Neural Topic Models with Labels and Authors
Tethering Broken Themes: Aligning Neural Topic Models with Labels and Authors
Mayank Nagda
Phil Ostheimer
Sophie Fellenz
51
0
0
22 Oct 2024
Self-calibration for Language Model Quantization and Pruning
Self-calibration for Language Model Quantization and Pruning
Miles Williams
G. Chrysostomou
Nikolaos Aletras
MQ
201
0
0
22 Oct 2024
MomentumSMoE: Integrating Momentum into Sparse Mixture of Experts
MomentumSMoE: Integrating Momentum into Sparse Mixture of Experts
R. Teo
Tan M. Nguyen
MoE
35
3
0
18 Oct 2024
From Gradient Clipping to Normalization for Heavy Tailed SGD
From Gradient Clipping to Normalization for Heavy Tailed SGD
Florian Hübler
Ilyas Fatkhullin
Niao He
40
5
0
17 Oct 2024
DAQ: Density-Aware Post-Training Weight-Only Quantization For LLMs
DAQ: Density-Aware Post-Training Weight-Only Quantization For LLMs
Yingsong Luo
Ling Chen
MQ
23
0
0
16 Oct 2024
Channel-Wise Mixed-Precision Quantization for Large Language Models
Channel-Wise Mixed-Precision Quantization for Large Language Models
Zihan Chen
Bike Xie
Jundong Li
Cong Shen
MQ
39
2
0
16 Oct 2024
Scaling laws for post-training quantized large language models
Scaling laws for post-training quantized large language models
Zifei Xu
Alexander Lan
W. Yazar
T. Webb
Sayeh Sharify
Xin Wang
MQ
35
0
0
15 Oct 2024
Quadratic Gating Functions in Mixture of Experts: A Statistical Insight
Quadratic Gating Functions in Mixture of Experts: A Statistical Insight
Pedram Akbarian
Huy Le Nguyen
Xing Han
Nhat Ho
MoE
42
0
0
15 Oct 2024
QSpec: Speculative Decoding with Complementary Quantization Schemes
QSpec: Speculative Decoding with Complementary Quantization Schemes
Juntao Zhao
Wenhao Lu
Sheng Wang
Lingpeng Kong
Chuan Wu
MQ
74
5
0
15 Oct 2024
Breaking the Memory Wall for Heterogeneous Federated Learning via Model
  Splitting
Breaking the Memory Wall for Heterogeneous Federated Learning via Model Splitting
Chunlin Tian
Li Li
Kahou Tam
Yebo Wu
Chengzhong Xu
FedML
34
3
0
12 Oct 2024
FlatQuant: Flatness Matters for LLM Quantization
FlatQuant: Flatness Matters for LLM Quantization
Yuxuan Sun
Ruikang Liu
Haoli Bai
Han Bao
Kang Zhao
...
Lu Hou
Chun Yuan
Xin Jiang
Wei Liu
Jun Yao
MQ
79
4
0
12 Oct 2024
ELICIT: LLM Augmentation via External In-Context Capability
ELICIT: LLM Augmentation via External In-Context Capability
Futing Wang
Jianhao Yan
Yue Zhang
Tao Lin
44
0
0
12 Oct 2024
More Experts Than Galaxies: Conditionally-overlapping Experts With Biologically-Inspired Fixed Routing
More Experts Than Galaxies: Conditionally-overlapping Experts With Biologically-Inspired Fixed Routing
Sagi Shaier
Francisco Pereira
K. Wense
Lawrence E Hunter
Matt Jones
MoE
46
0
0
10 Oct 2024
MM-Ego: Towards Building Egocentric Multimodal LLMs for Video QA
MM-Ego: Towards Building Egocentric Multimodal LLMs for Video QA
Hanrong Ye
Haotian Zhang
Erik Daxberger
Lin Chen
Zongyu Lin
...
Haoxuan You
Dan Xu
Zhe Gan
Jiasen Lu
Yinfei Yang
EgoV
MLLM
88
12
0
09 Oct 2024
Rodimus*: Breaking the Accuracy-Efficiency Trade-Off with Efficient Attentions
Rodimus*: Breaking the Accuracy-Efficiency Trade-Off with Efficient Attentions
Zhihao He
Hang Yu
Zi Gong
Shizhan Liu
J. Li
Weiyao Lin
VLM
38
1
0
09 Oct 2024
From Tokens to Words: On the Inner Lexicon of LLMs
From Tokens to Words: On the Inner Lexicon of LLMs
Guy Kaplan
Matanel Oren
Yuval Reif
Roy Schwartz
52
14
0
08 Oct 2024
Taylor Unswift: Secured Weight Release for Large Language Models via Taylor Expansion
Taylor Unswift: Secured Weight Release for Large Language Models via Taylor Expansion
Guanchu Wang
Yu-Neng Chuang
Ruixiang Tang
Shaochen Zhong
Jiayi Yuan
...
Zirui Liu
V. Chaudhary
Shuai Xu
James Caverlee
Xia Hu
PILM
84
1
0
06 Oct 2024
ARB-LLM: Alternating Refined Binarizations for Large Language Models
ARB-LLM: Alternating Refined Binarizations for Large Language Models
Zhiteng Li
Xinyu Yan
Tianao Zhang
Haotong Qin
Dong Xie
Jiang Tian
Zhongchao Shi
Linghe Kong
Yulun Zhang
Xiaokang Yang
MQ
37
2
0
04 Oct 2024
Demystifying the Token Dynamics of Deep Selective State Space Models
Demystifying the Token Dynamics of Deep Selective State Space Models
Thieu N. Vo
Tung D. Pham
Xin T. Tong
Tan Minh Nguyen
Mamba
54
0
0
04 Oct 2024
Surgical, Cheap, and Flexible: Mitigating False Refusal in Language Models via Single Vector Ablation
Surgical, Cheap, and Flexible: Mitigating False Refusal in Language Models via Single Vector Ablation
Xinpeng Wang
Chengzhi Hu
Paul Röttger
Barbara Plank
46
6
0
04 Oct 2024
Mitigating Memorization In Language Models
Mitigating Memorization In Language Models
Mansi Sakarvadia
Aswathy Ajith
Arham Khan
Nathaniel Hudson
Caleb Geniesse
Kyle Chard
Yaoqing Yang
Ian Foster
Michael W. Mahoney
KELM
MU
58
1
0
03 Oct 2024
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration
Jintao Zhang
Jia wei
Pengle Zhang
Jun-Jie Zhu
Jun Zhu
Jianfei Chen
VLM
MQ
87
19
0
03 Oct 2024
Selective Attention Improves Transformer
Selective Attention Improves Transformer
Yaniv Leviathan
Matan Kalman
Yossi Matias
51
9
0
03 Oct 2024
Erasing Conceptual Knowledge from Language Models
Erasing Conceptual Knowledge from Language Models
Rohit Gandikota
Sheridan Feucht
Samuel Marks
David Bau
KELM
ELM
MU
52
6
0
03 Oct 2024
House of Cards: Massive Weights in LLMs
House of Cards: Massive Weights in LLMs
Jaehoon Oh
Seungjun Shin
Dokwan Oh
43
1
0
02 Oct 2024
Discrete Copula Diffusion
Discrete Copula Diffusion
Guy Van den Broeck
Oliver Broadrick
Mathias Niepert
Mathias Niepert
DiffM
61
4
0
02 Oct 2024
On Expressive Power of Looped Transformers: Theoretical Analysis and Enhancement via Timestep Encoding
On Expressive Power of Looped Transformers: Theoretical Analysis and Enhancement via Timestep Encoding
Kevin Xu
Issei Sato
39
3
0
02 Oct 2024
Racing Thoughts: Explaining Contextualization Errors in Large Language Models
Racing Thoughts: Explaining Contextualization Errors in Large Language Models
Michael A. Lepori
Michael Mozer
Asma Ghandeharioun
LRM
87
1
0
02 Oct 2024
Two Sparse Matrices are Better than One: Sparsifying Neural Networks
  with Double Sparse Factorization
Two Sparse Matrices are Better than One: Sparsifying Neural Networks with Double Sparse Factorization
Vladimír Boža
Vladimír Macko
30
1
0
27 Sep 2024
An Adversarial Perspective on Machine Unlearning for AI Safety
An Adversarial Perspective on Machine Unlearning for AI Safety
Jakub Łucki
Boyi Wei
Yangsibo Huang
Peter Henderson
F. Tramèr
Javier Rando
MU
AAML
77
33
0
26 Sep 2024
Accumulator-Aware Post-Training Quantization
Accumulator-Aware Post-Training Quantization
Ian Colbert
Fabian Grob
Giuseppe Franco
Jinjie Zhang
Rayan Saab
MQ
35
4
0
25 Sep 2024
MEOW: MEMOry Supervised LLM Unlearning Via Inverted Facts
MEOW: MEMOry Supervised LLM Unlearning Via Inverted Facts
Tianle Gu
Kexin Huang
Ruilin Luo
Yuanqi Yao
Yujiu Yang
Yan Teng
Yingchun Wang
MU
42
5
0
18 Sep 2024
Bio-Inspired Mamba: Temporal Locality and Bioplausible Learning in
  Selective State Space Models
Bio-Inspired Mamba: Temporal Locality and Bioplausible Learning in Selective State Space Models
Jiahao Qin
Mamba
AI4CE
28
1
0
17 Sep 2024
Retro-li: Small-Scale Retrieval Augmented Generation Supporting Noisy Similarity Searches and Domain Shift Generalization
Retro-li: Small-Scale Retrieval Augmented Generation Supporting Noisy Similarity Searches and Domain Shift Generalization
Gentiana Rashiti
G. Karunaratne
Mrinmaya Sachan
Abu Sebastian
Abbas Rahimi
RALM
44
0
0
12 Sep 2024
Representation Tuning
Representation Tuning
Christopher M. Ackerman
LLMSV
29
0
0
11 Sep 2024
Previous
123456...121314
Next