ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2411.02544
  4. Cited By
Pretrained transformer efficiently learns low-dimensional target
  functions in-context

Pretrained transformer efficiently learns low-dimensional target functions in-context

4 November 2024
Kazusato Oko
Yujin Song
Taiji Suzuki
Denny Wu
ArXiv (abs)PDFHTML

Papers citing "Pretrained transformer efficiently learns low-dimensional target functions in-context"

21 / 21 papers shown
Title
Repetita Iuvant: Data Repetition Allows SGD to Learn High-Dimensional Multi-Index Functions
Repetita Iuvant: Data Repetition Allows SGD to Learn High-Dimensional Multi-Index Functions
Luca Arnaboldi
Yatin Dandi
Florent Krzakala
Luca Pesce
Ludovic Stephan
119
18
0
24 May 2024
Asymptotic theory of in-context learning by linear attention
Asymptotic theory of in-context learning by linear attention
Yue M. Lu
Mary I. Letey
Jacob A. Zavatone-Veth
Anindita Maiti
Cengiz Pehlevan
94
16
0
20 May 2024
How do Transformers perform In-Context Autoregressive Learning?
How do Transformers perform In-Context Autoregressive Learning?
Michael E. Sander
Raja Giryes
Taiji Suzuki
Mathieu Blondel
Gabriel Peyré
86
10
0
08 Feb 2024
An Information-Theoretic Analysis of In-Context Learning
An Information-Theoretic Analysis of In-Context Learning
Hong Jun Jeon
Jason D. Lee
Qi Lei
Benjamin Van Roy
122
24
0
28 Jan 2024
Provable Multi-Task Representation Learning by Two-Layer ReLU Neural
  Networks
Provable Multi-Task Representation Learning by Two-Layer ReLU Neural Networks
Liam Collins
Hamed Hassani
Mahdi Soltanolkotabi
Aryan Mokhtari
Sanjay Shakkottai
125
11
0
13 Jul 2023
Beyond NTK with Vanilla Gradient Descent: A Mean-Field Analysis of
  Neural Networks with Polynomial Width, Samples, and Time
Beyond NTK with Vanilla Gradient Descent: A Mean-Field Analysis of Neural Networks with Polynomial Width, Samples, and Time
Arvind V. Mahankali
Jeff Z. HaoChen
Kefan Dong
Margalit Glasgow
Tengyu Ma
MLT
38
14
0
28 Jun 2023
Transformers as Statisticians: Provable In-Context Learning with
  In-Context Algorithm Selection
Transformers as Statisticians: Provable In-Context Learning with In-Context Algorithm Selection
Yu Bai
Fan Chen
Haiquan Wang
Caiming Xiong
Song Mei
52
198
0
07 Jun 2023
How Two-Layer Neural Networks Learn, One (Giant) Step at a Time
How Two-Layer Neural Networks Learn, One (Giant) Step at a Time
Yatin Dandi
Florent Krzakala
Bruno Loureiro
Luca Pesce
Ludovic Stephan
MLT
99
29
0
29 May 2023
Learning time-scales in two-layers neural networks
Learning time-scales in two-layers neural networks
Raphael Berthier
Andrea Montanari
Kangjie Zhou
174
38
0
28 Feb 2023
Transformers learn in-context by gradient descent
Transformers learn in-context by gradient descent
J. Oswald
Eyvind Niklasson
E. Randazzo
João Sacramento
A. Mordvintsev
A. Zhmoginov
Max Vladymyrov
MLT
118
496
0
15 Dec 2022
Learning Single-Index Models with Shallow Neural Networks
Learning Single-Index Models with Shallow Neural Networks
A. Bietti
Joan Bruna
Clayton Sanford
M. Song
218
71
0
27 Oct 2022
Neural Networks Efficiently Learn Low-Dimensional Representations with
  SGD
Neural Networks Efficiently Learn Low-Dimensional Representations with SGD
Alireza Mousavi-Hosseini
Sejun Park
M. Girotti
Ioannis Mitliagkas
Murat A. Erdogdu
MLT
373
50
0
29 Sep 2022
What Can Transformers Learn In-Context? A Case Study of Simple Function
  Classes
What Can Transformers Learn In-Context? A Case Study of Simple Function Classes
Shivam Garg
Dimitris Tsipras
Percy Liang
Gregory Valiant
145
514
0
01 Aug 2022
Hidden Progress in Deep Learning: SGD Learns Parities Near the
  Computational Limit
Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational Limit
Boaz Barak
Benjamin L. Edelman
Surbhi Goel
Sham Kakade
Eran Malach
Cyril Zhang
108
133
0
18 Jul 2022
High-dimensional Asymptotics of Feature Learning: How One Gradient Step
  Improves the Representation
High-dimensional Asymptotics of Feature Learning: How One Gradient Step Improves the Representation
Jimmy Ba
Murat A. Erdogdu
Taiji Suzuki
Zhichao Wang
Denny Wu
Greg Yang
MLT
91
129
0
03 May 2022
How rotational invariance of common kernels prevents generalization in
  high dimensions
How rotational invariance of common kernels prevents generalization in high dimensions
Konstantin Donhauser
Mingqi Wu
Fanny Yang
80
24
0
09 Apr 2021
Language Models are Few-Shot Learners
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
908
42,520
0
28 May 2020
Risks from Learned Optimization in Advanced Machine Learning Systems
Risks from Learned Optimization in Advanced Machine Learning Systems
Evan Hubinger
Chris van Merwijk
Vladimir Mikulik
Joar Skalse
Scott Garrabrant
99
154
0
05 Jun 2019
Linearized two-layers neural networks in high dimension
Linearized two-layers neural networks in high dimension
Behrooz Ghorbani
Song Mei
Theodor Misiakiewicz
Andrea Montanari
MLT
88
243
0
27 Apr 2019
Fundamental Limits of Weak Recovery with Applications to Phase Retrieval
Fundamental Limits of Weak Recovery with Applications to Phase Retrieval
Marco Mondelli
Andrea Montanari
87
122
0
20 Aug 2017
Optimal Errors and Phase Transitions in High-Dimensional Generalized
  Linear Models
Optimal Errors and Phase Transitions in High-Dimensional Generalized Linear Models
Jean Barbier
Florent Krzakala
N. Macris
Léo Miolane
Lenka Zdeborová
94
268
0
10 Aug 2017
1