ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.17998
  4. Cited By
TRACE for Tracking the Emergence of Semantic Representations in Transformers

TRACE for Tracking the Emergence of Semantic Representations in Transformers

23 May 2025
Nura Aljaafari
Danilo S. Carvalho
André Freitas
ArXivPDFHTML

Papers citing "TRACE for Tracking the Emergence of Semantic Representations in Transformers"

16 / 16 papers shown
Title
Layer by Layer: Uncovering Hidden Representations in Language Models
Layer by Layer: Uncovering Hidden Representations in Language Models
Oscar Skean
Md Rifat Arefin
Dan Zhao
Niket Patel
Jalal Naghiyev
Yann LeCun
Ravid Shwartz-Ziv
MILM
AIFin
141
19
0
04 Feb 2025
A Complexity-Based Theory of Compositionality
A Complexity-Based Theory of Compositionality
Eric Elmoznino
Thomas Jiralerspong
Yoshua Bengio
Guillaume Lajoie
CoGe
87
9
0
18 Oct 2024
Geometric Signatures of Compositionality Across a Language Model's Lifetime
Geometric Signatures of Compositionality Across a Language Model's Lifetime
Jin Hwa Lee
Thomas Jiralerspong
Lei Yu
Yoshua Bengio
Emily Cheng
CoGe
110
3
0
02 Oct 2024
The Geometry of Categorical and Hierarchical Concepts in Large Language Models
The Geometry of Categorical and Hierarchical Concepts in Large Language Models
Kiho Park
Yo Joong Choe
Yibo Jiang
Victor Veitch
68
35
0
03 Jun 2024
Emergence of a High-Dimensional Abstraction Phase in Language Transformers
Emergence of a High-Dimensional Abstraction Phase in Language Transformers
Emily Cheng
Diego Doimo
Corentin Kervadec
Iuri Macocco
Jade Yu
Alessandro Laio
Marco Baroni
117
13
0
24 May 2024
LISA: Layerwise Importance Sampling for Memory-Efficient Large Language
  Model Fine-Tuning
LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning
Rui Pan
Xiang Liu
Shizhe Diao
Renjie Pi
Jipeng Zhang
Chi Han
Tong Zhang
56
42
0
26 Mar 2024
What Algorithms can Transformers Learn? A Study in Length Generalization
What Algorithms can Transformers Learn? A Study in Length Generalization
Hattie Zhou
Arwen Bradley
Etai Littwin
Noam Razin
Omid Saremi
Josh Susskind
Samy Bengio
Preetum Nakkiran
48
118
0
24 Oct 2023
The geometry of hidden representations of large transformer models
The geometry of hidden representations of large transformer models
L. Valeriani
Diego Doimo
F. Cuturello
Alessandro Laio
A. Ansuini
Alberto Cazzaniga
MILM
39
52
0
01 Feb 2023
Systematic Generalization and Emergent Structures in Transformers
  Trained on Structured Tasks
Systematic Generalization and Emergent Structures in Transformers Trained on Structured Tasks
Yuxuan Li
James L. McClelland
78
18
0
02 Oct 2022
In-context Learning and Induction Heads
In-context Learning and Induction Heads
Catherine Olsson
Nelson Elhage
Neel Nanda
Nicholas Joseph
Nova Dassarma
...
Tom B. Brown
Jack Clark
Jared Kaplan
Sam McCandlish
C. Olah
298
494
0
24 Sep 2022
Visualizing high-dimensional loss landscapes with Hessian directions
Visualizing high-dimensional loss landscapes with Hessian directions
Lucas Böttcher
Gregory R. Wheeler
52
14
0
28 Aug 2022
Knowledge Neurons in Pretrained Transformers
Knowledge Neurons in Pretrained Transformers
Damai Dai
Li Dong
Y. Hao
Zhifang Sui
Baobao Chang
Furu Wei
KELM
MU
60
440
0
18 Apr 2021
Intrinsic Dimensionality Explains the Effectiveness of Language Model
  Fine-Tuning
Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning
Armen Aghajanyan
Luke Zettlemoyer
Sonal Gupta
75
549
1
22 Dec 2020
Analysing Mathematical Reasoning Abilities of Neural Models
Analysing Mathematical Reasoning Abilities of Neural Models
D. Saxton
Edward Grefenstette
Felix Hill
Pushmeet Kohli
LRM
125
420
0
02 Apr 2019
Attention Is All You Need
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
453
129,831
0
12 Jun 2017
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
  Minima
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
362
2,922
0
15 Sep 2016
1