Implicit Representations of Meaning in Neural Language Models

1 June 2021

ArXiv (abs)PDF HTML Github (54★)

Papers citing "Implicit Representations of Meaning in Neural Language Models"

50 / 122 papers shown

Title
Don't throw the baby out with the bathwater: How and why deep learning for ARC Jack Cole Mohamed Osman LRM 45 0 0 17 Jun 2025
Large Language Models Do Multi-Label Classification Differently Marcus Ma Georgios Chochlakis Niyantha Maruthu Pandiyan Jesse Thomason Shrikanth Narayanan 108 1 0 23 May 2025
Language Models use Lookbacks to Track Beliefs Nikhil Prakash Natalie Shapira Arnab Sen Sharma Christoph Riedl Yonatan Belinkov Tamar Rott Shaham David Bau Atticus Geiger KELM 82 1 0 20 May 2025
Exploring How LLMs Capture and Represent Domain-Specific Knowledge Mirian Hipolito Garcia Camille Couturier Daniel Madrigal Diaz Ankur Mallick Anastasios Kyrillidis Robert Sim Victor Rühle Saravan Rajmohan 75 1 0 23 Apr 2025
Revisiting the Othello World Model Hypothesis Yifei Yuan Anders Søgaard LRM 97 0 0 06 Mar 2025
Towards Understanding Distilled Reasoning Models: A Representational Approach David D. Baek Max Tegmark LRM 116 6 0 05 Mar 2025
(How) Do Language Models Track State? Belinda Z. Li Zifan Carl Guo Jacob Andreas LRM 115 3 0 04 Mar 2025
Grandes modelos de lenguaje: de la predicción de palabras a la comprensión? Carlos Gómez-Rodríguez SyDa AILaw ELM VLM 267 0 0 25 Feb 2025
From Text to Space: Mapping Abstract Spatial Models in LLMs during a Grid-World Navigation Task Nicolas Martorell LLMAG 136 2 0 23 Feb 2025
Abstraction Alignment: Comparing Model-Learned and Human-Encoded Conceptual Relationships Angie Boggust Hyemin Bang Hendrik Strobelt Arvindmani Satyanarayan 108 1 0 17 Feb 2025
MET-Bench: Multimodal Entity Tracking for Evaluating the Limitations of Vision-Language and Reasoning Models Vanya Cohen Raymond J. Mooney 114 0 0 15 Feb 2025
Mechanistic Interpretability of Emotion Inference in Large Language Models Ala Nekouvaght Tak Amin Banayeeanzade Anahita Bolourani Mina Kian Robin Jia Jonathan Gratch 110 0 0 08 Feb 2025
Harmonic Loss Trains Interpretable AI Models David D. Baek Ziming Liu Riya Tyagi Max Tegmark 159 2 0 03 Feb 2025
Emergent Stack Representations in Modeling Counter Languages Using Transformers Utkarsh Tiwari Aviral Gupta Michael Hahn 502 0 0 03 Feb 2025
ICLR: In-Context Learning of Representations Core Francisco Park Andrew Lee Ekdeep Singh Lubana Yongyi Yang Maya Okawa Kento Nishi Martin Wattenberg Hidenori Tanaka AIFin 252 6 0 29 Dec 2024
The Semantic Hub Hypothesis: Language Models Share Semantic Representations Across Languages and Modalities Zhaofeng Wu Xinyan Velocity Yu Dani Yogatama Jiasen Lu Yoon Kim AIFin 163 22 0 07 Nov 2024
Generative linguistics contribution to artificial intelligence: Where this contribution lies? Mohammed Q. Shormani AI4CE 63 1 0 26 Oct 2024
Automatic Mapping of Anatomical Landmarks from Free-Text Using Large Language Models: Insights from Llama-2 Mohamad Abdi Gerardo Hermosillo Valadez H. Yerebakan MedIm 61 0 0 16 Oct 2024
Systems with Switching Causal Relations: A Meta-Causal Perspective Moritz Willig Tim Nelson Tobiasch Florian Peter Busch Jonas Seng Devendra Singh Dhami Kristian Kersting CML 152 0 0 16 Oct 2024
Exploring Natural Language-Based Strategies for Efficient Number Learning in Children through Reinforcement Learning Tirthankar Mittra 48 0 0 10 Oct 2024
Generalization from Starvation: Hints of Universality in LLM Knowledge Graph Learning David D. Baek Yuxiao Li Max Tegmark 76 2 0 10 Oct 2024
Chip-Tuning: Classify Before Language Models Say Fangwei Zhu Dian Li Jiajun Huang Gang Liu Hui Wang Zhifang Sui 62 0 0 09 Oct 2024
Chain and Causal Attention for Efficient Entity Tracking Erwan Fagnou Paul Caillon Blaise Delattre Alexandre Allauzen 95 5 0 07 Oct 2024
Counterfactual Token Generation in Large Language Models Ivi Chatzi N. C. Benz Eleni Straitouri Stratis Tsirtsis Manuel Gomez Rodriguez LRM 119 5 0 25 Sep 2024
Perception-guided Jailbreak against Text-to-Image Models Yihao Huang Le Liang Tianlin Li Xiaojun Jia Run Wang Weikai Miao G. Pu Yang Liu 127 11 0 20 Aug 2024
Understanding Generative AI Content with Embedding Models Max Vargas Reilly Cannon A. Engel Anand D. Sarwate Tony Chiang 224 3 0 19 Aug 2024
Latent Causal Probing: A Formal Perspective on Probing with Causal Models of Data Charles Jin Martin Rinard 88 1 0 18 Jul 2024
States Hidden in Hidden States: LLMs Emerge Discrete State Representations Implicitly Junhao Chen Shengding Hu Zhiyuan Liu Maosong Sun LRM 84 5 0 16 Jul 2024
Monitoring Latent World States in Language Models with Propositional Probes Jiahai Feng Stuart Russell Jacob Steinhardt HILM 89 14 0 27 Jun 2024
Does GPT Really Get It? A Hierarchical Scale to Quantify Human vs AI's Understanding of Algorithms Mirabel Reid Santosh Vempala ELM 93 0 0 20 Jun 2024
Estimating Knowledge in Large Language Models Without Generating a Single Token Daniela Gottesman Mor Geva 97 14 0 18 Jun 2024
Refusal in Language Models Is Mediated by a Single Direction Andy Arditi Oscar Obeso Aaquib Syed Daniel Paleka Nina Panickssery Wes Gurnee Neel Nanda 171 218 0 17 Jun 2024
A Notion of Complexity for Theory of Mind via Discrete World Models X. A. Huang Emanuele La Malfa Samuele Marro Andrea Asperti Anthony Cohn Michael Wooldridge 95 8 0 16 Jun 2024
What Should Embeddings Embed? Autoregressive Models Represent Latent Generating Distributions Liyi Zhang Michael Y. Li Thomas Griffiths 77 3 0 06 Jun 2024
Evaluating the World Model Implicit in a Generative Model Keyon Vafa Justin Y. Chen Jon M. Kleinberg S. Mullainathan Ashesh Rambachan 166 41 0 06 Jun 2024
InversionView: A General-Purpose Method for Reading Information from Neural Activations Xinting Huang Madhur Panwar Navin Goyal Michael Hahn 101 5 0 27 May 2024
Implicit In-context Learning Zhuowei Li Zihao Xu Ligong Han Yunhe Gao Song Wen Di Liu Hao Wang Dimitris N. Metaxas 149 3 0 23 May 2024
Simulating Policy Impacts: Developing a Generative Scenario Writing Method to Evaluate the Perceived Effects of Regulation Julia Barnett Kimon Kieslich Nicholas Diakopoulos 57 5 0 15 May 2024
Towards Principled Evaluations of Sparse Autoencoders for Interpretability and Control Aleksandar Makelov Georg Lange Neel Nanda 79 41 0 14 May 2024
A Philosophical Introduction to Language Models - Part II: The Way Forward Raphael Milliere Cameron Buckner LRM 124 15 0 06 May 2024
Mechanistic Interpretability for AI Safety -- A Review Leonard Bereska E. Gavves AI4CE 139 158 0 22 Apr 2024
SelfIE: Self-Interpretation of Large Language Model Embeddings Haozhe Chen Carl Vondrick Chengzhi Mao 67 27 0 16 Mar 2024
Towards a theory of model distillation Enric Boix-Adserà FedML VLM 80 8 0 14 Mar 2024
Towards Tracing Trustworthiness Dynamics: Revisiting Pre-training Period of Large Language Models Chao Qian Jie Zhang Wei Yao Dongrui Liu Zhen-fei Yin Yu Qiao Yong Liu Jing Shao LLMSV LRM 98 12 0 29 Feb 2024
RAVEL: Evaluating Interpretability Methods on Disentangling Language Model Representations Jing-ling Huang Zhengxuan Wu Christopher Potts Mor Geva Atticus Geiger 130 35 0 27 Feb 2024
What Do Language Models Hear? Probing for Auditory Representations in Language Models Jerry Ngo Yoon Kim AuLLM MILM 66 8 0 26 Feb 2024
Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking Nikhil Prakash Tamar Rott Shaham Tal Haklay Yonatan Belinkov David Bau 99 67 0 22 Feb 2024
On the Tip of the Tongue: Analyzing Conceptual Representation in Large Language Models with Reverse-Dictionary Probe Ningyu Xu Qi Zhang Menghan Zhang Peng Qian Xuanjing Huang LRM 124 3 0 22 Feb 2024
Strong hallucinations from negation and how to fix them Nicholas Asher Swarnadeep Bhar ReLM LRM 54 5 0 16 Feb 2024
LAVE: LLM-Powered Agent Assistance and Language Augmentation for Video Editing Bryan Wang Yuliang Li Zhaoyang Lv Haijun Xia Yan Xu Raj Sodhi 92 53 0 15 Feb 2024