ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.07809
  4. Cited By
pyvene: A Library for Understanding and Improving PyTorch Models via
  Interventions

pyvene: A Library for Understanding and Improving PyTorch Models via Interventions

12 March 2024
Zhengxuan Wu
Atticus Geiger
Aryaman Arora
Jing-ling Huang
Zheng Wang
Noah D. Goodman
Christopher D. Manning
Christopher Potts
    MU
ArXivPDFHTML

Papers citing "pyvene: A Library for Understanding and Improving PyTorch Models via Interventions"

10 / 10 papers shown
Title
MIB: A Mechanistic Interpretability Benchmark
MIB: A Mechanistic Interpretability Benchmark
Aaron Mueller
Atticus Geiger
Sarah Wiegreffe
Dana Arad
Iván Arcuschin
...
Alessandro Stolfo
Martin Tutek
Amir Zur
David Bau
Yonatan Belinkov
43
1
0
17 Apr 2025
Enhancing Hallucination Detection through Noise Injection
Enhancing Hallucination Detection through Noise Injection
Litian Liu
Reza Pourreza
Sunny Panchal
Apratim Bhattacharyya
Yao Qin
Roland Memisevic
HILM
75
2
0
06 Feb 2025
Controllable Context Sensitivity and the Knob Behind It
Controllable Context Sensitivity and the Knob Behind It
Julian Minder
Kevin Du
Niklas Stoehr
Giovanni Monea
Chris Wendler
Robert West
Ryan Cotterell
KELM
49
3
0
11 Nov 2024
Personality Alignment of Large Language Models
Personality Alignment of Large Language Models
Minjun Zhu
Linyi Yang
Yue Zhang
Yue Zhang
ALM
64
5
0
21 Aug 2024
Penzai + Treescope: A Toolkit for Interpreting, Visualizing, and Editing
  Models As Data
Penzai + Treescope: A Toolkit for Interpreting, Visualizing, and Editing Models As Data
Mingshu Li
36
3
0
01 Aug 2024
Monitoring Latent World States in Language Models with Propositional
  Probes
Monitoring Latent World States in Language Models with Propositional Probes
Jiahai Feng
Stuart Russell
Jacob Steinhardt
HILM
43
6
0
27 Jun 2024
Uncovering Intermediate Variables in Transformers using Circuit Probing
Uncovering Intermediate Variables in Transformers using Circuit Probing
Michael A. Lepori
Thomas Serre
Ellie Pavlick
75
7
0
07 Nov 2023
Dissecting Recall of Factual Associations in Auto-Regressive Language
  Models
Dissecting Recall of Factual Associations in Auto-Regressive Language Models
Mor Geva
Jasmijn Bastings
Katja Filippova
Amir Globerson
KELM
191
261
0
28 Apr 2023
Finding Alignments Between Interpretable Causal Variables and
  Distributed Neural Representations
Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations
Atticus Geiger
Zhengxuan Wu
Christopher Potts
Thomas F. Icard
Noah D. Goodman
CML
75
98
0
05 Mar 2023
Interpretability in the Wild: a Circuit for Indirect Object
  Identification in GPT-2 small
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small
Kevin Wang
Alexandre Variengien
Arthur Conmy
Buck Shlegeris
Jacob Steinhardt
212
494
0
01 Nov 2022
1