ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1610.01644
  4. Cited By
Understanding intermediate layers using linear classifier probes

Understanding intermediate layers using linear classifier probes

5 October 2016
Guillaume Alain
Yoshua Bengio
    FAtt
ArXivPDFHTML

Papers citing "Understanding intermediate layers using linear classifier probes"

50 / 187 papers shown
Title
Designing a Dashboard for Transparency and Control of Conversational AI
Designing a Dashboard for Transparency and Control of Conversational AI
Yida Chen
Aoyu Wu
Trevor DePodesta
Catherine Yeh
Kenneth Li
...
Jan Riecke
Shivam Raval
Olivia Seow
Martin Wattenberg
Fernanda Viégas
44
16
0
12 Jun 2024
Standards for Belief Representations in LLMs
Standards for Belief Representations in LLMs
Daniel A. Herrmann
B. Levinstein
42
7
0
31 May 2024
On Fairness of Low-Rank Adaptation of Large Models
On Fairness of Low-Rank Adaptation of Large Models
Zhoujie Ding
Ken Ziyu Liu
Pura Peetathawatchai
Berivan Isik
Sanmi Koyejo
48
4
0
27 May 2024
Adaptive Activation Steering: A Tuning-Free LLM Truthfulness Improvement Method for Diverse Hallucinations Categories
Adaptive Activation Steering: A Tuning-Free LLM Truthfulness Improvement Method for Diverse Hallucinations Categories
Tianlong Wang
Xianfeng Jiao
Yifan He
Zhongzhi Chen
Yinghao Zhu
Xu Chu
Junyi Gao
Yasha Wang
Liantao Ma
LLMSV
68
7
0
26 May 2024
A Multi-Perspective Analysis of Memorization in Large Language Models
A Multi-Perspective Analysis of Memorization in Large Language Models
Bowen Chen
Namgi Han
Yusuke Miyao
46
1
0
19 May 2024
Linear Explanations for Individual Neurons
Linear Explanations for Individual Neurons
Tuomas P. Oikarinen
Tsui-Wei Weng
FAtt
MILM
31
6
0
10 May 2024
A separability-based approach to quantifying generalization: which layer
  is best?
A separability-based approach to quantifying generalization: which layer is best?
Luciano Dyballa
Evan Gerritz
Steven W. Zucker
OOD
37
3
0
02 May 2024
Comparison of self-supervised in-domain and supervised out-domain
  transfer learning for bird species recognition
Comparison of self-supervised in-domain and supervised out-domain transfer learning for bird species recognition
H. Ghaffari
Paul Devos
45
0
0
26 Apr 2024
Does Transformer Interpretability Transfer to RNNs?
Does Transformer Interpretability Transfer to RNNs?
Gonccalo Paulo
Thomas Marshall
Nora Belrose
63
6
0
09 Apr 2024
Joint-Embedding Masked Autoencoder for Self-supervised Learning of Dynamic Functional Connectivity from the Human Brain
Joint-Embedding Masked Autoencoder for Self-supervised Learning of Dynamic Functional Connectivity from the Human Brain
Jungwon Choi
Hyungi Lee
Byung-Hoon Kim
Juho Lee
80
0
0
11 Mar 2024
Complexity Matters: Dynamics of Feature Learning in the Presence of
  Spurious Correlations
Complexity Matters: Dynamics of Feature Learning in the Presence of Spurious Correlations
GuanWen Qiu
Da Kuang
Surbhi Goel
27
8
0
05 Mar 2024
Language Models Represent Beliefs of Self and Others
Language Models Represent Beliefs of Self and Others
Wentao Zhu
Zhining Zhang
Yizhou Wang
MILM
LRM
50
8
0
28 Feb 2024
Descriptive Kernel Convolution Network with Improved Random Walk Kernel
Descriptive Kernel Convolution Network with Improved Random Walk Kernel
Meng-Chieh Lee
Lingxiao Zhao
L. Akoglu
23
3
0
08 Feb 2024
Black-Box Access is Insufficient for Rigorous AI Audits
Black-Box Access is Insufficient for Rigorous AI Audits
Stephen Casper
Carson Ezell
Charlotte Siegmann
Noam Kolt
Taylor Lynn Curtis
...
Michael Gerovitch
David Bau
Max Tegmark
David M. Krueger
Dylan Hadfield-Menell
AAML
34
78
0
25 Jan 2024
Beyond Concept Bottleneck Models: How to Make Black Boxes Intervenable?
Beyond Concept Bottleneck Models: How to Make Black Boxes Intervenable?
Sonia Laguna
Ricards Marcinkevics
Moritz Vandenhirtz
Julia E. Vogt
35
17
0
24 Jan 2024
Patchscopes: A Unifying Framework for Inspecting Hidden Representations
  of Language Models
Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models
Asma Ghandeharioun
Avi Caciularu
Adam Pearce
Lucas Dixon
Mor Geva
34
87
0
11 Jan 2024
Enhancing Contrastive Learning with Efficient Combinatorial Positive
  Pairing
Enhancing Contrastive Learning with Efficient Combinatorial Positive Pairing
Jaeill Kim
Duhun Hwang
Eunjung Lee
Jangwon Suh
Jimyeong Kim
Wonjong Rhee
33
0
0
11 Jan 2024
FlexModel: A Framework for Interpretability of Distributed Large
  Language Models
FlexModel: A Framework for Interpretability of Distributed Large Language Models
Matthew Choi
Muhammad Adil Asif
John Willes
David Emerson
AI4CE
ALM
27
1
0
05 Dec 2023
Revisiting Topic-Guided Language Models
Revisiting Topic-Guided Language Models
Carolina Zheng
Keyon Vafa
David M. Blei
BDL
29
1
0
04 Dec 2023
Identifying Spurious Correlations using Counterfactual Alignment
Identifying Spurious Correlations using Counterfactual Alignment
Joseph Paul Cohen
Louis Blankemeier
Akshay S. Chaudhari
CML
55
1
0
01 Dec 2023
Looped Transformers are Better at Learning Learning Algorithms
Looped Transformers are Better at Learning Learning Algorithms
Liu Yang
Kangwook Lee
Robert D. Nowak
Dimitris Papailiopoulos
24
24
0
21 Nov 2023
Setting the Trap: Capturing and Defeating Backdoors in Pretrained
  Language Models through Honeypots
Setting the Trap: Capturing and Defeating Backdoors in Pretrained Language Models through Honeypots
Ruixiang Tang
Jiayi Yuan
Yiming Li
Zirui Liu
Rui Chen
Xia Hu
AAML
36
13
0
28 Oct 2023
Codebook Features: Sparse and Discrete Interpretability for Neural
  Networks
Codebook Features: Sparse and Discrete Interpretability for Neural Networks
Alex Tamkin
Mohammad Taufeeque
Noah D. Goodman
35
27
0
26 Oct 2023
Reset It and Forget It: Relearning Last-Layer Weights Improves Continual
  and Transfer Learning
Reset It and Forget It: Relearning Last-Layer Weights Improves Continual and Transfer Learning
Lapo Frati
Neil Traft
Jeff Clune
Nick Cheney
CLL
27
0
0
12 Oct 2023
Language Models Represent Space and Time
Language Models Represent Space and Time
Wes Gurnee
Max Tegmark
47
142
0
03 Oct 2023
Uncovering the Hidden Cost of Model Compression
Uncovering the Hidden Cost of Model Compression
Diganta Misra
Muawiz Chaudhary
Agam Goyal
Bharat Runwal
Pin-Yu Chen
VLM
36
0
0
29 Aug 2023
Causal Intersectionality and Dual Form of Gradient Descent for
  Multimodal Analysis: a Case Study on Hateful Memes
Causal Intersectionality and Dual Form of Gradient Descent for Multimodal Analysis: a Case Study on Hateful Memes
Yosuke Miyanishi
M. Nguyen
34
2
0
19 Aug 2023
Concept backpropagation: An Explainable AI approach for visualising
  learned concepts in neural network models
Concept backpropagation: An Explainable AI approach for visualising learned concepts in neural network models
Patrik Hammersborg
Inga Strümke
FAtt
26
0
0
24 Jul 2023
Systematic Architectural Design of Scale Transformed Attention Condenser
  DNNs via Multi-Scale Class Representational Response Similarity Analysis
Systematic Architectural Design of Scale Transformed Attention Condenser DNNs via Multi-Scale Class Representational Response Similarity Analysis
Andrew Hryniowski
Alexander Wong
16
0
0
16 Jun 2023
LabelBench: A Comprehensive Framework for Benchmarking Adaptive
  Label-Efficient Learning
LabelBench: A Comprehensive Framework for Benchmarking Adaptive Label-Efficient Learning
Jifan Zhang
Yifang Chen
Gregory H. Canal
Stephen Mussmann
Arnav M. Das
...
Yinglun Zhu
Jeffrey Bilmes
S. Du
Kevin G. Jamieson
Robert D. Nowak
VLM
33
10
0
16 Jun 2023
From `Snippet-lects' to Doculects and Dialects: Leveraging Neural
  Representations of Speech for Placing Audio Signals in a Language Landscape
From `Snippet-lects' to Doculects and Dialects: Leveraging Neural Representations of Speech for Placing Audio Signals in a Language Landscape
Severine Guillaume
Guillaume Wisniewski
Alexis Michaud
23
2
0
29 May 2023
Gaussian Process Probes (GPP) for Uncertainty-Aware Probing
Gaussian Process Probes (GPP) for Uncertainty-Aware Probing
Zehao Wang
Alexander Ku
Jason Baldridge
Thomas L. Griffiths
Been Kim
UQCV
26
11
0
29 May 2023
Reverse Engineering Self-Supervised Learning
Reverse Engineering Self-Supervised Learning
Ido Ben-Shaul
Ravid Shwartz-Ziv
Tomer Galanti
S. Dekel
Yann LeCun
SSL
23
34
0
24 May 2023
COLA: A Benchmark for Compositional Text-to-image Retrieval
COLA: A Benchmark for Compositional Text-to-image Retrieval
Arijit Ray
Filip Radenovic
Abhimanyu Dubey
Bryan A. Plummer
Ranjay Krishna
Kate Saenko
CoGe
VLM
41
34
0
05 May 2023
VNE: An Effective Method for Improving Deep Representation by
  Manipulating Eigenvalue Distribution
VNE: An Effective Method for Improving Deep Representation by Manipulating Eigenvalue Distribution
Jaeill Kim
Suhyun Kang
Duhun Hwang
Jungwook Shin
Wonjong Rhee
DRL
13
21
0
04 Apr 2023
Eliciting Latent Predictions from Transformers with the Tuned Lens
Eliciting Latent Predictions from Transformers with the Tuned Lens
Nora Belrose
Zach Furman
Logan Smith
Danny Halawi
Igor V. Ostrovsky
Lev McKinney
Stella Biderman
Jacob Steinhardt
22
193
0
14 Mar 2023
SR-init: An interpretable layer pruning method
SR-init: An interpretable layer pruning method
Hui Tang
Yao Lu
Qi Xuan
15
8
0
14 Mar 2023
Revisiting Pre-training in Audio-Visual Learning
Revisiting Pre-training in Audio-Visual Learning
Ruoxuan Feng
Wenke Xia
Di Hu
30
1
0
07 Feb 2023
Identifiability of latent-variable and structural-equation models: from
  linear to nonlinear
Identifiability of latent-variable and structural-equation models: from linear to nonlinear
Aapo Hyvarinen
Ilyes Khemakhem
R. Monti
CML
30
41
0
06 Feb 2023
Trustworthy Social Bias Measurement
Trustworthy Social Bias Measurement
Rishi Bommasani
Percy Liang
27
10
0
20 Dec 2022
A Natural Bias for Language Generation Models
A Natural Bias for Language Generation Models
Clara Meister
Wojciech Stokowiec
Tiago Pimentel
Lei Yu
Laura Rimell
A. Kuncoro
MILM
33
6
0
19 Dec 2022
ColD Fusion: Collaborative Descent for Distributed Multitask Finetuning
ColD Fusion: Collaborative Descent for Distributed Multitask Finetuning
Shachar Don-Yehiya
Elad Venezian
Colin Raffel
Noam Slonim
Yoav Katz
Leshem Choshen
MoMe
28
52
0
02 Dec 2022
Supervised Pretraining for Molecular Force Fields and Properties
  Prediction
Supervised Pretraining for Molecular Force Fields and Properties Prediction
Xiang Gao
Weihao Gao
Wen Xiao
Zhirui Wang
Chong Wang
Liang Xiang
AI4CE
25
8
0
23 Nov 2022
Layer-Stack Temperature Scaling
Layer-Stack Temperature Scaling
Amr Khalifa
Michael C. Mozer
Hanie Sedghi
Behnam Neyshabur
Ibrahim M. Alabdulmohsin
78
2
0
18 Nov 2022
Emergence of Concepts in DNNs?
Emergence of Concepts in DNNs?
Tim Räz
21
0
0
11 Nov 2022
Reinforcement Learning in an Adaptable Chess Environment for Detecting
  Human-understandable Concepts
Reinforcement Learning in an Adaptable Chess Environment for Detecting Human-understandable Concepts
Patrik Hammersborg
Inga Strümke
17
5
0
10 Nov 2022
COPEN: Probing Conceptual Knowledge in Pre-trained Language Models
COPEN: Probing Conceptual Knowledge in Pre-trained Language Models
Hao Peng
Xiaozhi Wang
Shengding Hu
Hailong Jin
Lei Hou
Juanzi Li
Zhiyuan Liu
Qun Liu
18
22
0
08 Nov 2022
A Law of Data Separation in Deep Learning
A Law of Data Separation in Deep Learning
Hangfeng He
Weijie J. Su
OOD
24
36
0
31 Oct 2022
Probing for targeted syntactic knowledge through grammatical error
  detection
Probing for targeted syntactic knowledge through grammatical error detection
Christopher Davis
Christopher Bryant
Andrew Caines
Marek Rei
P. Buttery
22
3
0
28 Oct 2022
The Curious Case of Benign Memorization
The Curious Case of Benign Memorization
Sotiris Anagnostidis
Gregor Bachmann
Lorenzo Noci
Thomas Hofmann
AAML
49
8
0
25 Oct 2022
Previous
1234
Next