Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.05459
Cited By
v1
v2 (latest)
From Sparse Dependence to Sparse Attention: Unveiling How Chain-of-Thought Enhances Transformer Sample Efficiency
7 October 2024
Kaiyue Wen
Huaqing Zhang
Hongzhou Lin
Jingzhao Zhang
MoE
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"From Sparse Dependence to Sparse Attention: Unveiling How Chain-of-Thought Enhances Transformer Sample Efficiency"
8 / 58 papers shown
Title
Learning Parities with Neural Networks
Amit Daniely
Eran Malach
104
78
0
18 Feb 2020
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
673
4,945
0
23 Jan 2020
Poly-time universality and limitations of deep learning
Emmanuel Abbe
Colin Sandon
62
23
0
07 Jan 2020
PyTorch: An Imperative Style, High-Performance Deep Learning Library
Adam Paszke
Sam Gross
Francisco Massa
Adam Lerer
James Bradbury
...
Sasank Chilamkurthy
Benoit Steiner
Lu Fang
Junjie Bai
Soumith Chintala
ODL
649
42,859
0
03 Dec 2019
Provable limitations of deep learning
Emmanuel Abbe
Colin Sandon
AAML
75
45
0
16 Dec 2018
Failures of Gradient-Based Deep Learning
Shai Shalev-Shwartz
Ohad Shamir
Shaked Shammah
ODL
UQCV
129
201
0
23 Mar 2017
Densely Connected Convolutional Networks
Gao Huang
Zhuang Liu
Laurens van der Maaten
Kilian Q. Weinberger
PINN
3DV
1.2K
37,032
0
25 Aug 2016
Adam: A Method for Stochastic Optimization
Diederik P. Kingma
Jimmy Ba
ODL
2.5K
150,696
0
22 Dec 2014
Previous
1
2