ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.05459
  4. Cited By
From Sparse Dependence to Sparse Attention: Unveiling How Chain-of-Thought Enhances Transformer Sample Efficiency
v1v2 (latest)

From Sparse Dependence to Sparse Attention: Unveiling How Chain-of-Thought Enhances Transformer Sample Efficiency

7 October 2024
Kaiyue Wen
Huaqing Zhang
Hongzhou Lin
Jingzhao Zhang
    MoELRM
ArXiv (abs)PDFHTML

Papers citing "From Sparse Dependence to Sparse Attention: Unveiling How Chain-of-Thought Enhances Transformer Sample Efficiency"

8 / 58 papers shown
Title
Learning Parities with Neural Networks
Learning Parities with Neural Networks
Amit Daniely
Eran Malach
104
78
0
18 Feb 2020
Scaling Laws for Neural Language Models
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
673
4,945
0
23 Jan 2020
Poly-time universality and limitations of deep learning
Poly-time universality and limitations of deep learning
Emmanuel Abbe
Colin Sandon
62
23
0
07 Jan 2020
PyTorch: An Imperative Style, High-Performance Deep Learning Library
PyTorch: An Imperative Style, High-Performance Deep Learning Library
Adam Paszke
Sam Gross
Francisco Massa
Adam Lerer
James Bradbury
...
Sasank Chilamkurthy
Benoit Steiner
Lu Fang
Junjie Bai
Soumith Chintala
ODL
649
42,859
0
03 Dec 2019
Provable limitations of deep learning
Provable limitations of deep learning
Emmanuel Abbe
Colin Sandon
AAML
75
45
0
16 Dec 2018
Failures of Gradient-Based Deep Learning
Failures of Gradient-Based Deep Learning
Shai Shalev-Shwartz
Ohad Shamir
Shaked Shammah
ODLUQCV
129
201
0
23 Mar 2017
Densely Connected Convolutional Networks
Densely Connected Convolutional Networks
Gao Huang
Zhuang Liu
Laurens van der Maaten
Kilian Q. Weinberger
PINN3DV
1.2K
37,032
0
25 Aug 2016
Adam: A Method for Stochastic Optimization
Adam: A Method for Stochastic Optimization
Diederik P. Kingma
Jimmy Ba
ODL
2.5K
150,696
0
22 Dec 2014
Previous
12