ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2005.13630
  4. Cited By
Explaining Neural Networks by Decoding Layer Activations

Explaining Neural Networks by Decoding Layer Activations

27 May 2020
Johannes Schneider
Michalis Vlachos
    AI4CE
ArXivPDFHTML

Papers citing "Explaining Neural Networks by Decoding Layer Activations"

4 / 4 papers shown
Title
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
Samuel Marks
Can Rager
Eric J. Michaud
Yonatan Belinkov
David Bau
Aaron Mueller
46
112
0
28 Mar 2024
Does Localization Inform Editing? Surprising Differences in
  Causality-Based Localization vs. Knowledge Editing in Language Models
Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models
Peter Hase
Joey Tianyi Zhou
Been Kim
Asma Ghandeharioun
MILM
36
167
0
10 Jan 2023
Concept-based Adversarial Attacks: Tricking Humans and Classifiers Alike
Concept-based Adversarial Attacks: Tricking Humans and Classifiers Alike
Johannes Schneider
Giovanni Apruzzese
AAML
24
8
0
18 Mar 2022
Deceptive AI Explanations: Creation and Detection
Deceptive AI Explanations: Creation and Detection
Johannes Schneider
Christian Meske
Michalis Vlachos
14
28
0
21 Jan 2020
1