v1v2 (latest)

Shared Global and Local Geometry of Language Model Embeddings

27 March 2025

Papers citing "Shared Global and Local Geometry of Language Model Embeddings"

33 / 33 papers shown

Title
Jailbreak Strength and Model Similarity Predict Transferability Rico Angell Jannik Brinkmann He He 24 0 0 15 Jun 2025
Training-Free Tokenizer Transplantation via Orthogonal Matching Pursuit Charles Goddard Fernando Fernandes Neto 30 0 0 07 Jun 2025
Transferring Features Across Language Models With Model Stitching Alan Chen Jack Merullo Alessandro Stolfo Ellie Pavlick 35 0 0 07 Jun 2025
Do different prompting methods yield a common task representation in language models? Guy Davidson Todd M. Gureckis Brenden M. Lake Adina Williams 58 2 0 17 May 2025
Probing the Vulnerability of Large Language Models to Polysemantic Interventions Bofan Gong Shiyang Lai Dawn Song AAML MILM 72 1 0 16 May 2025
The Geometry of Self-Verification in a Task-Specific Reasoning Model Andrew Lee Lihao Sun Chris Wendler Fernanda Viégas Martin Wattenberg LRM 172 1 0 19 Apr 2025
RespDiff: An End-to-End Multi-scale RNN Diffusion Model for Respiratory Waveform Estimation from PPG Signals Yuyang Miao Zehua Chen Chong Li Danilo Mandic DiffM MedIm 79 9 0 06 Oct 2024
Gemma 2: Improving Open Language Models at a Practical Size Gemma Team Gemma Team Morgane Riviere Shreya Pathak Pier Giuseppe Sessa Cassidy Hardin ... Noah Fiedel Armand Joulin Kathleen Kenealy Robert Dadashi Alek Andreev VLM MoE OSLM 149 922 0 31 Jul 2024
The Geometry of Categorical and Hierarchical Concepts in Large Language Models Kiho Park Yo Joong Choe Yibo Jiang Victor Veitch 133 41 0 03 Jun 2024
The Platonic Representation Hypothesis Minyoung Huh Brian Cheung Tongzhou Wang Phillip Isola 138 142 0 13 May 2024
Fishing for Magikarp: Automatically Detecting Under-trained Tokens in Large Language Models Sander Land Max Bartolo 116 25 0 08 May 2024
Universal Neurons in GPT2 Language Models Wes Gurnee Theo Horsley Zifan Carl Guo Tara Rezaei Kheirkhah Qinyi Sun Will Hathaway Neel Nanda Dimitris Bertsimas MILM 158 47 0 22 Jan 2024
A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity Andrew Lee Xiaoyan Bai Itamar Pres Martin Wattenberg Jonathan K. Kummerfeld Rada Mihalcea 147 121 0 03 Jan 2024
Steering Llama 2 via Contrastive Activation Addition Nina Rimsky Nick Gabrieli Julian Schulz Meg Tong Evan Hubinger Alexander Matt Turner LLMSV 61 226 0 09 Dec 2023
The Linear Representation Hypothesis and the Geometry of Large Language Models Kiho Park Yo Joong Choe Victor Veitch LLMSV MILM 176 190 0 07 Nov 2023
Circuit Component Reuse Across Tasks in Transformer Language Models Jack Merullo Carsten Eickhoff Ellie Pavlick 84 71 0 12 Oct 2023
Emergent Linear Representations in World Models of Self-Supervised Sequence Models Neel Nanda Andrew Lee Martin Wattenberg FAtt MILM 122 186 0 02 Sep 2023
Inference-Time Intervention: Eliciting Truthful Answers from a Language Model Kenneth Li Oam Patel Fernanda Viégas Hanspeter Pfister Martin Wattenberg KELM HILM 160 584 0 06 Jun 2023
A Toy Model of Universality: Reverse Engineering How Networks Learn Group Operations Bilal Chughtai Lawrence Chan Neel Nanda 116 103 0 06 Feb 2023
Discovering Language Model Behaviors with Model-Written Evaluations Ethan Perez Sam Ringer Kamilė Lukošiūtė Karina Nguyen Edwin Chen ... Danny Hernandez Deep Ganguli Evan Hubinger Nicholas Schiefer Jared Kaplan ALM 97 407 0 19 Dec 2022
Linearly Mapping from Image to Text Space Jack Merullo Louis Castricato Carsten Eickhoff Ellie Pavlick VLM 248 118 0 30 Sep 2022
Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space Mor Geva Avi Caciularu Ke Wang Yoav Goldberg KELM 144 389 0 28 Mar 2022
Revisiting Model Stitching to Compare Neural Representations Yamini Bansal Preetum Nakkiran Boaz Barak FedML 117 121 0 14 Jun 2021
Contrastive Learning Inverts the Data Generating Process Roland S. Zimmermann Yash Sharma Steffen Schneider Matthias Bethge Wieland Brendel SSL 376 223 0 17 Feb 2021
RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models Samuel Gehman Suchin Gururangan Maarten Sap Yejin Choi Noah A. Smith 228 1,224 0 24 Sep 2020
Embedding Comparator: Visualizing Differences in Global Structure and Local Neighborhoods via Small Multiples Angie Boggust Brandon Carter Arvind Satyanarayan 102 65 0 10 Dec 2019
Gromov-Wasserstein Alignment of Word Embedding Spaces David Alvarez-Melis Tommi Jaakkola OT 60 328 0 31 Aug 2018
Adversarial Reprogramming of Neural Networks Gamaleldin F. Elsayed Ian Goodfellow Jascha Narain Sohl-Dickstein OOD AAML 55 183 0 28 Jun 2018
Residual Connections Encourage Iterative Inference Stanislaw Jastrzebski Devansh Arpit Nicolas Ballas Vikas Verma Tong Che Yoshua Bengio 95 156 0 13 Oct 2017
Deep Residual Learning for Image Recognition Kaiming He Xinming Zhang Shaoqing Ren Jian Sun MedIm 2.5K 195,053 0 10 Dec 2015
Understanding image representations by measuring their equivariance and equivalence Karel Lenc Andrea Vedaldi SSL FAtt 155 538 0 21 Nov 2014
Distributed Representations of Words and Phrases and their Compositionality Tomas Mikolov Ilya Sutskever Kai Chen G. Corrado J. Dean NAI OCL 429 33,605 0 16 Oct 2013
Exploiting Similarities among Languages for Machine Translation Tomas Mikolov Quoc V. Le Ilya Sutskever 111 1,597 0 17 Sep 2013