ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2307.08352
  4. Cited By
Zero-th Order Algorithm for Softmax Attention Optimization

Zero-th Order Algorithm for Softmax Attention Optimization

17 July 2023
Yichuan Deng
Zhihang Li
Sridhar Mahadevan
Zhao-quan Song
ArXivPDFHTML

Papers citing "Zero-th Order Algorithm for Softmax Attention Optimization"

14 / 14 papers shown
Title
Scaling Law Phenomena Across Regression Paradigms: Multiple and Kernel Approaches
Yifang Chen
Xuyang Guo
Xiaoyu Li
Yingyu Liang
Zhenmei Shi
Zhao-quan Song
65
3
0
03 Mar 2025
Binary Hypothesis Testing for Softmax Models and Leverage Score Models
Binary Hypothesis Testing for Softmax Models and Leverage Score Models
Yeqi Gao
Yuzhou Gu
Zhao-quan Song
33
0
0
09 May 2024
The Fine-Grained Complexity of Gradient Computation for Training Large
  Language Models
The Fine-Grained Complexity of Gradient Computation for Training Large Language Models
Josh Alman
Zhao-quan Song
21
12
0
07 Feb 2024
Enhancing Stochastic Gradient Descent: A Unified Framework and Novel
  Acceleration Methods for Faster Convergence
Enhancing Stochastic Gradient Descent: A Unified Framework and Novel Acceleration Methods for Faster Convergence
Yichuan Deng
Zhao-quan Song
Chiwun Yang
26
1
0
02 Feb 2024
A Fast Optimization View: Reformulating Single Layer Attention in LLM
  Based on Tensor and SVM Trick, and Solving It in Matrix Multiplication Time
A Fast Optimization View: Reformulating Single Layer Attention in LLM Based on Tensor and SVM Trick, and Solving It in Matrix Multiplication Time
Yeqi Gao
Zhao-quan Song
Weixin Wang
Junze Yin
20
25
0
14 Sep 2023
How to Protect Copyright Data in Optimization of Large Language Models?
How to Protect Copyright Data in Optimization of Large Language Models?
T. Chu
Zhao-quan Song
Chiwun Yang
34
29
0
23 Aug 2023
Convergence of Two-Layer Regression with Nonlinear Units
Convergence of Two-Layer Regression with Nonlinear Units
Yichuan Deng
Zhao-quan Song
Shenghao Xie
21
7
0
16 Aug 2023
H$_2$O: Heavy-Hitter Oracle for Efficient Generative Inference of Large
  Language Models
H2_22​O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models
Zhenyu (Allen) Zhang
Ying Sheng
Tianyi Zhou
Tianlong Chen
Lianmin Zheng
...
Yuandong Tian
Christopher Ré
Clark W. Barrett
Zhangyang Wang
Beidi Chen
VLM
47
252
0
24 Jun 2023
Just One Byte (per gradient): A Note on Low-Bandwidth Decentralized
  Language Model Finetuning Using Shared Randomness
Just One Byte (per gradient): A Note on Low-Bandwidth Decentralized Language Model Finetuning Using Shared Randomness
E. Zelikman
Qian Huang
Percy Liang
Nick Haber
Noah D. Goodman
62
14
0
16 Jun 2023
An Iterative Algorithm for Rescaled Hyperbolic Functions Regression
An Iterative Algorithm for Rescaled Hyperbolic Functions Regression
Yeqi Gao
Zhao-quan Song
Junze Yin
28
33
0
01 May 2023
Solving Tensor Low Cycle Rank Approximation
Solving Tensor Low Cycle Rank Approximation
Yichuan Deng
Yeqi Gao
Zhao-quan Song
31
6
0
13 Apr 2023
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sébastien Bubeck
Varun Chandrasekaran
Ronen Eldan
J. Gehrke
Eric Horvitz
...
Scott M. Lundberg
Harsha Nori
Hamid Palangi
Marco Tulio Ribeiro
Yi Zhang
ELM
AI4MH
AI4CE
ALM
295
2,232
0
22 Mar 2023
How Do Transformers Learn Topic Structure: Towards a Mechanistic
  Understanding
How Do Transformers Learn Topic Structure: Towards a Mechanistic Understanding
Yuchen Li
Yuan-Fang Li
Andrej Risteski
120
61
0
07 Mar 2023
Linear Convergence of Gradient and Proximal-Gradient Methods Under the
  Polyak-Łojasiewicz Condition
Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition
Hamed Karimi
J. Nutini
Mark W. Schmidt
139
1,199
0
16 Aug 2016
1