ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2102.12452
  4. Cited By
Probing Classifiers: Promises, Shortcomings, and Advances

Probing Classifiers: Promises, Shortcomings, and Advances

24 February 2021
Yonatan Belinkov
ArXivPDFHTML

Papers citing "Probing Classifiers: Promises, Shortcomings, and Advances"

50 / 71 papers shown
Title
Designing and Contextualising Probes for African Languages
Designing and Contextualising Probes for African Languages
Wisdom Aduah
Francois Meyer
74
0
0
15 May 2025
Geospatial Mechanistic Interpretability of Large Language Models
Geospatial Mechanistic Interpretability of Large Language Models
Stef De Sabbata
Stefano Mizzaro
Kevin Roitero
AI4CE
103
0
0
06 May 2025
Decoding Vision Transformers: the Diffusion Steering Lens
Decoding Vision Transformers: the Diffusion Steering Lens
Ryota Takatsuki
Sonia Joseph
Ippei Fujisawa
Ryota Kanai
DiffM
83
0
0
18 Apr 2025
Linguistic Interpretability of Transformer-based Language Models: a systematic review
Linguistic Interpretability of Transformer-based Language Models: a systematic review
Miguel López-Otal
Jorge Gracia
Jordi Bernad
Carlos Bobed
Lucía Pitarch-Ballesteros
Emma Anglés-Herrero
VLM
94
1
0
09 Apr 2025
Learning on LLM Output Signatures for gray-box Behavior Analysis
Learning on LLM Output Signatures for gray-box Behavior Analysis
Guy Bar-Shalom
Fabrizio Frasca
Derek Lim
Yoav Gelberg
Yftah Ziser
Ran El-Yaniv
Gal Chechik
Haggai Maron
113
0
0
18 Mar 2025
ASIDE: Architectural Separation of Instructions and Data in Language Models
ASIDE: Architectural Separation of Instructions and Data in Language Models
Egor Zverev
Evgenii Kortukov
Alexander Panfilov
Soroush Tabesh
Alexandra Volkova
Sebastian Lapuschkin
Wojciech Samek
Christoph H. Lampert
AAML
104
2
0
13 Mar 2025
Gender Encoding Patterns in Pretrained Language Model Representations
Mahdi Zakizadeh
Mohammad Taher Pilehvar
196
0
0
09 Mar 2025
Linear Representations of Political Perspective Emerge in Large Language Models
Linear Representations of Political Perspective Emerge in Large Language Models
Junsol Kim
James Evans
Aaron Schein
124
6
0
03 Mar 2025
Superscopes: Amplifying Internal Feature Representations for Language Model Interpretation
Jonathan Jacobi
Gal Niv
LRM
ReLM
119
0
0
03 Mar 2025
Model Lakes
Model Lakes
Koyena Pal
David Bau
Renée J. Miller
142
2
0
24 Feb 2025
Mechanistic Unveiling of Transformer Circuits: Self-Influence as a Key to Model Reasoning
Mechanistic Unveiling of Transformer Circuits: Self-Influence as a Key to Model Reasoning
Lefei Zhang
Lijie Hu
Di Wang
LRM
161
4
0
17 Feb 2025
Superpose Singular Features for Model Merging
Superpose Singular Features for Model Merging
Haiquan Qiu
You Wu
Quanming Yao
MoMe
139
0
0
15 Feb 2025
Sample-efficient Learning of Concepts with Theoretical Guarantees: from Data to Concepts without Interventions
Sample-efficient Learning of Concepts with Theoretical Guarantees: from Data to Concepts without Interventions
H. Fokkema
T. Erven
Sara Magliacane
114
2
0
10 Feb 2025
Mechanistic Interpretability of Emotion Inference in Large Language Models
Mechanistic Interpretability of Emotion Inference in Large Language Models
Ala Nekouvaght Tak
Amin Banayeeanzade
Anahita Bolourani
Mina Kian
Robin Jia
Jonathan Gratch
100
0
0
08 Feb 2025
Discovering Chunks in Neural Embeddings for Interpretability
Discovering Chunks in Neural Embeddings for Interpretability
Shuchen Wu
Stephan Alaniz
Eric Schulz
Zeynep Akata
82
0
0
03 Feb 2025
The Geometry of Tokens in Internal Representations of Large Language Models
The Geometry of Tokens in Internal Representations of Large Language Models
Karthik Viswanathan
Yuri Gardinazzi
Giada Panerai
Alberto Cazzaniga
Matteo Biagetti
AIFin
134
7
0
17 Jan 2025
GPT or BERT: why not both?
GPT or BERT: why not both?
Lucas Georges Gabriel Charpentier
David Samuel
142
5
0
31 Dec 2024
JailbreakLens: Interpreting Jailbreak Mechanism in the Lens of Representation and Circuit
Zeqing He
Peng Kuang
Zhixuan Chu
Huiyu Xu
Rui Zheng
Kui Ren
Chun Chen
99
7
0
17 Nov 2024
Towards Unifying Interpretability and Control: Evaluation via Intervention
Towards Unifying Interpretability and Control: Evaluation via Intervention
Usha Bhalla
Suraj Srinivas
Asma Ghandeharioun
Himabindu Lakkaraju
90
11
0
07 Nov 2024
Focus On This, Not That! Steering LLMs with Adaptive Feature Specification
Focus On This, Not That! Steering LLMs with Adaptive Feature Specification
Tom A. Lamb
Adam Davies
Alasdair Paren
Philip Torr
Francesco Pinto
108
0
0
30 Oct 2024
Arithmetic Without Algorithms: Language Models Solve Math With a Bag of Heuristics
Arithmetic Without Algorithms: Language Models Solve Math With a Bag of Heuristics
Yaniv Nikankin
Anja Reusch
Aaron Mueller
Yonatan Belinkov
AIFin
LRM
99
32
0
28 Oct 2024
Do LLMs "know" internally when they follow instructions?
Do LLMs "know" internally when they follow instructions?
Juyeon Heo
Christina Heinze-Deml
Oussama Elachqar
Shirley Ren
Udhay Nallasamy
Andy Miller
Kwan Ho Ryan Chan
Jaya Narain
85
10
0
18 Oct 2024
Inference and Verbalization Functions During In-Context Learning
Inference and Verbalization Functions During In-Context Learning
Junyi Tao
Xiaoyin Chen
Nelson F. Liu
LRM
ReLM
75
1
0
12 Oct 2024
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations
Hadas Orgad
Michael Toker
Zorik Gekhman
Roi Reichart
Idan Szpektor
Hadas Kotek
Yonatan Belinkov
HILM
AIFin
99
43
0
03 Oct 2024
Joint Estimation and Prediction of City-wide Delivery Demand: A Large Language Model Empowered Graph-based Learning Approach
Joint Estimation and Prediction of City-wide Delivery Demand: A Large Language Model Empowered Graph-based Learning Approach
Tong Nie
Junlin He
Yuewen Mei
Guoyang Qin
Guilong Li
Jian Sun
Wei Ma
86
4
0
30 Aug 2024
Understanding Generative AI Content with Embedding Models
Understanding Generative AI Content with Embedding Models
Max Vargas
Reilly Cannon
A. Engel
Anand D. Sarwate
Tony Chiang
187
3
0
19 Aug 2024
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models
Daking Rai
Yilun Zhou
Shi Feng
Abulhair Saparov
Ziyu Yao
149
32
0
02 Jul 2024
What Do VLMs NOTICE? A Mechanistic Interpretability Pipeline for Gaussian-Noise-free Text-Image Corruption and Evaluation
What Do VLMs NOTICE? A Mechanistic Interpretability Pipeline for Gaussian-Noise-free Text-Image Corruption and Evaluation
Michal Golovanevsky
William Rudman
Vedant Palit
Ritambhara Singh
Carsten Eickhoff
99
2
0
24 Jun 2024
What Languages are Easy to Language-Model? A Perspective from Learning Probabilistic Regular Languages
What Languages are Easy to Language-Model? A Perspective from Learning Probabilistic Regular Languages
Nadav Borenstein
Anej Svete
R. Chan
Josef Valvoda
Franz Nowak
Isabelle Augenstein
Eleanor Chodroff
Ryan Cotterell
72
13
0
06 Jun 2024
PrivacyRestore: Privacy-Preserving Inference in Large Language Models via Privacy Removal and Restoration
PrivacyRestore: Privacy-Preserving Inference in Large Language Models via Privacy Removal and Restoration
Huiping Zhuang
Jianwei Wang
Zhengdong Lu
Huiping Zhuang
Haoran Li
Huiping Zhuang
Cen Chen
RALM
KELM
79
8
0
03 Jun 2024
Adaptive Activation Steering: A Tuning-Free LLM Truthfulness Improvement Method for Diverse Hallucinations Categories
Adaptive Activation Steering: A Tuning-Free LLM Truthfulness Improvement Method for Diverse Hallucinations Categories
Tianlong Wang
Xianfeng Jiao
Yifan He
Zhongzhi Chen
Yinghao Zhu
Xu Chu
Junyi Gao
Yasha Wang
Liantao Ma
LLMSV
105
13
0
26 May 2024
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
Samuel Marks
Can Rager
Eric J. Michaud
Yonatan Belinkov
David Bau
Aaron Mueller
133
151
0
28 Mar 2024
Do Large Language Models Mirror Cognitive Language Processing?
Do Large Language Models Mirror Cognitive Language Processing?
Yuqi Ren
Renren Jin
Tongxuan Zhang
Deyi Xiong
91
6
0
28 Feb 2024
Uncovering Intermediate Variables in Transformers using Circuit Probing
Uncovering Intermediate Variables in Transformers using Circuit Probing
Michael A. Lepori
Thomas Serre
Ellie Pavlick
118
7
0
07 Nov 2023
Similarity of Neural Network Models: A Survey of Functional and Representational Measures
Similarity of Neural Network Models: A Survey of Functional and Representational Measures
Max Klabunde
Tobias Schumacher
M. Strohmaier
Florian Lemmerich
133
73
0
10 May 2023
What if This Modified That? Syntactic Interventions via Counterfactual
  Embeddings
What if This Modified That? Syntactic Interventions via Counterfactual Embeddings
Mycal Tucker
Peng Qian
R. Levy
53
39
0
28 May 2021
DirectProbe: Studying Representations without Classifiers
DirectProbe: Studying Representations without Classifiers
Yichu Zhou
Vivek Srikumar
70
29
0
13 Apr 2021
Low-Complexity Probing via Finding Subnetworks
Low-Complexity Probing via Finding Subnetworks
Steven Cao
Victor Sanh
Alexander M. Rush
43
54
0
08 Apr 2021
Picking BERT's Brain: Probing for Linguistic Dependencies in
  Contextualized Embeddings Using Representational Similarity Analysis
Picking BERT's Brain: Probing for Linguistic Dependencies in Contextualized Embeddings Using Representational Similarity Analysis
Michael A. Lepori
R. Thomas McCoy
50
24
0
24 Nov 2020
When Do You Need Billions of Words of Pretraining Data?
When Do You Need Billions of Words of Pretraining Data?
Yian Zhang
Alex Warstadt
Haau-Sing Li
Samuel R. Bowman
58
141
0
10 Nov 2020
Pareto Probing: Trading Off Accuracy for Complexity
Pareto Probing: Trading Off Accuracy for Complexity
Tiago Pimentel
Naomi Saphra
Adina Williams
Ryan Cotterell
55
60
0
05 Oct 2020
An information theoretic view on selecting linguistic probes
An information theoretic view on selecting linguistic probes
Zining Zhu
Frank Rudzicz
42
19
0
15 Sep 2020
CausaLM: Causal Model Explanation Through Counterfactual Language Models
CausaLM: Causal Model Explanation Through Counterfactual Language Models
Amir Feder
Nadav Oved
Uri Shalit
Roi Reichart
CML
LRM
92
161
0
27 May 2020
A Tale of a Probe and a Parser
A Tale of a Probe and a Parser
Rowan Hall Maudslay
Josef Valvoda
Tiago Pimentel
Adina Williams
Ryan Cotterell
51
55
0
04 May 2020
DeFormer: Decomposing Pre-trained Transformers for Faster Question
  Answering
DeFormer: Decomposing Pre-trained Transformers for Faster Question Answering
Qingqing Cao
H. Trivedi
A. Balasubramanian
Niranjan Balasubramanian
66
68
0
02 May 2020
Investigating Transferability in Pretrained Language Models
Investigating Transferability in Pretrained Language Models
Alex Tamkin
Trisha Singh
D. Giovanardi
Noah D. Goodman
MILM
62
48
0
30 Apr 2020
Asking without Telling: Exploring Latent Ontologies in Contextual
  Representations
Asking without Telling: Exploring Latent Ontologies in Contextual Representations
Julian Michael
Jan A. Botha
Ian Tenney
43
43
0
29 Apr 2020
Analyzing analytical methods: The case of phonology in neural models of
  spoken language
Analyzing analytical methods: The case of phonology in neural models of spoken language
Grzegorz Chrupała
Bertrand Higy
Afra Alishahi
42
20
0
15 Apr 2020
Information-Theoretic Probing with Minimum Description Length
Information-Theoretic Probing with Minimum Description Length
Elena Voita
Ivan Titov
82
275
0
27 Mar 2020
A Primer in BERTology: What we know about how BERT works
A Primer in BERTology: What we know about how BERT works
Anna Rogers
Olga Kovaleva
Anna Rumshisky
OffRL
87
1,497
0
27 Feb 2020
12
Next