ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2304.04397
  4. Cited By
Randomized and Deterministic Attention Sparsification Algorithms for
  Over-parameterized Feature Dimension

Randomized and Deterministic Attention Sparsification Algorithms for Over-parameterized Feature Dimension

10 April 2023
Yichuan Deng
Sridhar Mahadevan
Zhao Song
ArXivPDFHTML

Papers citing "Randomized and Deterministic Attention Sparsification Algorithms for Over-parameterized Feature Dimension"

32 / 32 papers shown
Title
Fast RoPE Attention: Combining the Polynomial Method and Fast Fourier Transform
Fast RoPE Attention: Combining the Polynomial Method and Fast Fourier Transform
Josh Alman
Zhao Song
27
12
0
17 May 2025
Scaling Law Phenomena Across Regression Paradigms: Multiple and Kernel Approaches
Yifang Chen
Xuyang Guo
Xiaoyu Li
Yingyu Liang
Zhenmei Shi
Zhao Song
73
3
0
03 Mar 2025
When Can We Solve the Weighted Low Rank Approximation Problem in Truly Subquadratic Time?
When Can We Solve the Weighted Low Rank Approximation Problem in Truly Subquadratic Time?
Chenyang Li
Yingyu Liang
Zhenmei Shi
Zhao Song
36
3
0
24 Feb 2025
Bypassing the Exponential Dependency: Looped Transformers Efficiently Learn In-context by Multi-step Gradient Descent
Bypassing the Exponential Dependency: Looped Transformers Efficiently Learn In-context by Multi-step Gradient Descent
Bo Chen
Xiaoyu Li
Yingyu Liang
Zhenmei Shi
Zhao Song
98
20
0
15 Oct 2024
Binary Hypothesis Testing for Softmax Models and Leverage Score Models
Binary Hypothesis Testing for Softmax Models and Leverage Score Models
Yeqi Gao
Yuzhou Gu
Zhao Song
37
0
0
09 May 2024
The Fine-Grained Complexity of Gradient Computation for Training Large
  Language Models
The Fine-Grained Complexity of Gradient Computation for Training Large Language Models
Josh Alman
Zhao Song
34
12
0
07 Feb 2024
Enhancing Stochastic Gradient Descent: A Unified Framework and Novel
  Acceleration Methods for Faster Convergence
Enhancing Stochastic Gradient Descent: A Unified Framework and Novel Acceleration Methods for Faster Convergence
Yichuan Deng
Zhao Song
Chiwun Yang
31
1
0
02 Feb 2024
One Pass Streaming Algorithm for Super Long Token Attention
  Approximation in Sublinear Space
One Pass Streaming Algorithm for Super Long Token Attention Approximation in Sublinear Space
Raghav Addanki
Chenyang Li
Zhao Song
Chiwun Yang
52
3
0
24 Nov 2023
The Expressibility of Polynomial based Attention Scheme
The Expressibility of Polynomial based Attention Scheme
Zhao Song
Guangyi Xu
Junze Yin
39
5
0
30 Oct 2023
Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time
Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time
Zichang Liu
Jue Wang
Tri Dao
Dinesh Manocha
Binhang Yuan
...
Anshumali Shrivastava
Ce Zhang
Yuandong Tian
Christopher Ré
Beidi Chen
BDL
35
194
0
26 Oct 2023
A Fast Optimization View: Reformulating Single Layer Attention in LLM
  Based on Tensor and SVM Trick, and Solving It in Matrix Multiplication Time
A Fast Optimization View: Reformulating Single Layer Attention in LLM Based on Tensor and SVM Trick, and Solving It in Matrix Multiplication Time
Yeqi Gao
Zhao Song
Weixin Wang
Junze Yin
26
26
0
14 Sep 2023
Solving Attention Kernel Regression Problem via Pre-conditioner
Solving Attention Kernel Regression Problem via Pre-conditioner
Zhao Song
Junze Yin
Licheng Zhang
33
10
0
28 Aug 2023
Zero-th Order Algorithm for Softmax Attention Optimization
Zero-th Order Algorithm for Softmax Attention Optimization
Yichuan Deng
Zhihang Li
Sridhar Mahadevan
Zhao Song
43
13
0
17 Jul 2023
Fast Quantum Algorithm for Attention Computation
Fast Quantum Algorithm for Attention Computation
Yeqi Gao
Zhao Song
Xin Yang
Ruizhe Zhang
LRM
36
20
0
16 Jul 2023
Efficient SGD Neural Network Training via Sublinear Activated Neuron
  Identification
Efficient SGD Neural Network Training via Sublinear Activated Neuron Identification
Lianke Qin
Zhao Song
Yuanyuan Yang
30
9
0
13 Jul 2023
H$_2$O: Heavy-Hitter Oracle for Efficient Generative Inference of Large
  Language Models
H2_22​O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models
Zhenyu Zhang
Ying Sheng
Dinesh Manocha
Tianlong Chen
Lianmin Zheng
...
Yuandong Tian
Christopher Ré
Clark W. Barrett
Zhangyang Wang
Beidi Chen
VLM
66
261
0
24 Jun 2023
InfoPrompt: Information-Theoretic Soft Prompt Tuning for Natural
  Language Understanding
InfoPrompt: Information-Theoretic Soft Prompt Tuning for Natural Language Understanding
Junda Wu
Tong Yu
Rui Wang
Zhao Song
Ruiyi Zhang
Handong Zhao
Chaochao Lu
Shuai Li
Ricardo Henao
VLM
44
23
0
08 Jun 2023
Query Complexity of Active Learning for Function Family With Nearly
  Orthogonal Basis
Query Complexity of Active Learning for Function Family With Nearly Orthogonal Basis
Xiangyi Chen
Zhao Song
Baochen Sun
Junze Yin
Danyang Zhuo
47
3
0
06 Jun 2023
A Mathematical Abstraction for Balancing the Trade-off Between
  Creativity and Reality in Large Language Models
A Mathematical Abstraction for Balancing the Trade-off Between Creativity and Reality in Large Language Models
Ritwik Sinha
Zhao Song
Dinesh Manocha
34
23
0
04 Jun 2023
Faster Robust Tensor Power Method for Arbitrary Order
Faster Robust Tensor Power Method for Arbitrary Order
Yichuan Deng
Zhao Song
Junze Yin
27
8
0
01 Jun 2023
Federated Empirical Risk Minimization via Second-Order Method
Federated Empirical Risk Minimization via Second-Order Method
S. Bian
Zhao Song
Junze Yin
FedML
41
8
0
27 May 2023
Fast Submodular Function Maximization
Fast Submodular Function Maximization
Lianke Qin
Zhao Song
Yitan Wang
31
10
0
15 May 2023
Efficient Asynchronize Stochastic Gradient Algorithm with Structured
  Data
Efficient Asynchronize Stochastic Gradient Algorithm with Structured Data
Zhao Song
Mingquan Ye
32
4
0
13 May 2023
Differentially Private Attention Computation
Differentially Private Attention Computation
Yeqi Gao
Zhao Song
Xin Yang
55
21
0
08 May 2023
An Iterative Algorithm for Rescaled Hyperbolic Functions Regression
An Iterative Algorithm for Rescaled Hyperbolic Functions Regression
Yeqi Gao
Zhao Song
Junze Yin
36
33
0
01 May 2023
The Closeness of In-Context Learning and Weight Shifting for Softmax
  Regression
The Closeness of In-Context Learning and Weight Shifting for Softmax Regression
Shuai Li
Zhao Song
Yu Xia
Tong Yu
Dinesh Manocha
41
38
0
26 Apr 2023
Attention Scheme Inspired Softmax Regression
Attention Scheme Inspired Softmax Regression
Yichuan Deng
Zhihang Li
Zhao Song
44
42
0
20 Apr 2023
Solving Tensor Low Cycle Rank Approximation
Solving Tensor Low Cycle Rank Approximation
Yichuan Deng
Yeqi Gao
Zhao Song
39
6
0
13 Apr 2023
Low Rank Matrix Completion via Robust Alternating Minimization in Nearly
  Linear Time
Low Rank Matrix Completion via Robust Alternating Minimization in Nearly Linear Time
Yuzhou Gu
Zhao Song
Junze Yin
Licheng Zhang
23
26
0
21 Feb 2023
Bypass Exponential Time Preprocessing: Fast Neural Network Training via
  Weight-Data Correlation Preprocessing
Bypass Exponential Time Preprocessing: Fast Neural Network Training via Weight-Data Correlation Preprocessing
Josh Alman
Jiehao Liang
Zhao Song
Ruizhe Zhang
Danyang Zhuo
84
31
0
25 Nov 2022
Dynamic Tensor Product Regression
Dynamic Tensor Product Regression
Aravind Reddy
Zhao Song
Licheng Zhang
47
21
0
08 Oct 2022
Dynamic Maintenance of Kernel Density Estimation Data Structure: From
  Practice to Theory
Dynamic Maintenance of Kernel Density Estimation Data Structure: From Practice to Theory
Jiehao Liang
Zhao Song
Zhaozhuo Xu
Junze Yin
Danyang Zhuo
OOD
34
4
0
08 Aug 2022
1