Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2106.00737
Cited By
Implicit Representations of Meaning in Neural Language Models
1 June 2021
Belinda Z. Li
Maxwell Nye
Jacob Andreas
NAI
MILM
Re-assign community
ArXiv (abs)
PDF
HTML
Github (54★)
Papers citing
"Implicit Representations of Meaning in Neural Language Models"
50 / 122 papers shown
Title
Don't throw the baby out with the bathwater: How and why deep learning for ARC
Jack Cole
Mohamed Osman
LRM
45
0
0
17 Jun 2025
Large Language Models Do Multi-Label Classification Differently
Marcus Ma
Georgios Chochlakis
Niyantha Maruthu Pandiyan
Jesse Thomason
Shrikanth Narayanan
108
1
0
23 May 2025
Language Models use Lookbacks to Track Beliefs
Nikhil Prakash
Natalie Shapira
Arnab Sen Sharma
Christoph Riedl
Yonatan Belinkov
Tamar Rott Shaham
David Bau
Atticus Geiger
KELM
82
1
0
20 May 2025
Exploring How LLMs Capture and Represent Domain-Specific Knowledge
Mirian Hipolito Garcia
Camille Couturier
Daniel Madrigal Diaz
Ankur Mallick
Anastasios Kyrillidis
Robert Sim
Victor Rühle
Saravan Rajmohan
75
1
0
23 Apr 2025
Revisiting the Othello World Model Hypothesis
Yifei Yuan
Anders Søgaard
LRM
97
0
0
06 Mar 2025
Towards Understanding Distilled Reasoning Models: A Representational Approach
David D. Baek
Max Tegmark
LRM
116
6
0
05 Mar 2025
(How) Do Language Models Track State?
Belinda Z. Li
Zifan Carl Guo
Jacob Andreas
LRM
115
3
0
04 Mar 2025
Grandes modelos de lenguaje: de la predicción de palabras a la comprensión?
Carlos Gómez-Rodríguez
SyDa
AILaw
ELM
VLM
267
0
0
25 Feb 2025
From Text to Space: Mapping Abstract Spatial Models in LLMs during a Grid-World Navigation Task
Nicolas Martorell
LLMAG
136
2
0
23 Feb 2025
Abstraction Alignment: Comparing Model-Learned and Human-Encoded Conceptual Relationships
Angie Boggust
Hyemin Bang
Hendrik Strobelt
Arvindmani Satyanarayan
108
1
0
17 Feb 2025
MET-Bench: Multimodal Entity Tracking for Evaluating the Limitations of Vision-Language and Reasoning Models
Vanya Cohen
Raymond J. Mooney
114
0
0
15 Feb 2025
Mechanistic Interpretability of Emotion Inference in Large Language Models
Ala Nekouvaght Tak
Amin Banayeeanzade
Anahita Bolourani
Mina Kian
Robin Jia
Jonathan Gratch
110
0
0
08 Feb 2025
Harmonic Loss Trains Interpretable AI Models
David D. Baek
Ziming Liu
Riya Tyagi
Max Tegmark
159
2
0
03 Feb 2025
Emergent Stack Representations in Modeling Counter Languages Using Transformers
Utkarsh Tiwari
Aviral Gupta
Michael Hahn
502
0
0
03 Feb 2025
ICLR: In-Context Learning of Representations
Core Francisco Park
Andrew Lee
Ekdeep Singh Lubana
Yongyi Yang
Maya Okawa
Kento Nishi
Martin Wattenberg
Hidenori Tanaka
AIFin
252
6
0
29 Dec 2024
The Semantic Hub Hypothesis: Language Models Share Semantic Representations Across Languages and Modalities
Zhaofeng Wu
Xinyan Velocity Yu
Dani Yogatama
Jiasen Lu
Yoon Kim
AIFin
163
22
0
07 Nov 2024
Generative linguistics contribution to artificial intelligence: Where this contribution lies?
Mohammed Q. Shormani
AI4CE
63
1
0
26 Oct 2024
Automatic Mapping of Anatomical Landmarks from Free-Text Using Large Language Models: Insights from Llama-2
Mohamad Abdi
Gerardo Hermosillo Valadez
H. Yerebakan
MedIm
61
0
0
16 Oct 2024
Systems with Switching Causal Relations: A Meta-Causal Perspective
Moritz Willig
Tim Nelson Tobiasch
Florian Peter Busch
Jonas Seng
Devendra Singh Dhami
Kristian Kersting
CML
152
0
0
16 Oct 2024
Exploring Natural Language-Based Strategies for Efficient Number Learning in Children through Reinforcement Learning
Tirthankar Mittra
48
0
0
10 Oct 2024
Generalization from Starvation: Hints of Universality in LLM Knowledge Graph Learning
David D. Baek
Yuxiao Li
Max Tegmark
76
2
0
10 Oct 2024
Chip-Tuning: Classify Before Language Models Say
Fangwei Zhu
Dian Li
Jiajun Huang
Gang Liu
Hui Wang
Zhifang Sui
62
0
0
09 Oct 2024
Chain and Causal Attention for Efficient Entity Tracking
Erwan Fagnou
Paul Caillon
Blaise Delattre
Alexandre Allauzen
95
5
0
07 Oct 2024
Counterfactual Token Generation in Large Language Models
Ivi Chatzi
N. C. Benz
Eleni Straitouri
Stratis Tsirtsis
Manuel Gomez Rodriguez
LRM
119
5
0
25 Sep 2024
Perception-guided Jailbreak against Text-to-Image Models
Yihao Huang
Le Liang
Tianlin Li
Xiaojun Jia
Run Wang
Weikai Miao
G. Pu
Yang Liu
127
11
0
20 Aug 2024
Understanding Generative AI Content with Embedding Models
Max Vargas
Reilly Cannon
A. Engel
Anand D. Sarwate
Tony Chiang
224
3
0
19 Aug 2024
Latent Causal Probing: A Formal Perspective on Probing with Causal Models of Data
Charles Jin
Martin Rinard
88
1
0
18 Jul 2024
States Hidden in Hidden States: LLMs Emerge Discrete State Representations Implicitly
Junhao Chen
Shengding Hu
Zhiyuan Liu
Maosong Sun
LRM
84
5
0
16 Jul 2024
Monitoring Latent World States in Language Models with Propositional Probes
Jiahai Feng
Stuart Russell
Jacob Steinhardt
HILM
89
14
0
27 Jun 2024
Does GPT Really Get It? A Hierarchical Scale to Quantify Human vs AI's Understanding of Algorithms
Mirabel Reid
Santosh Vempala
ELM
93
0
0
20 Jun 2024
Estimating Knowledge in Large Language Models Without Generating a Single Token
Daniela Gottesman
Mor Geva
97
14
0
18 Jun 2024
Refusal in Language Models Is Mediated by a Single Direction
Andy Arditi
Oscar Obeso
Aaquib Syed
Daniel Paleka
Nina Panickssery
Wes Gurnee
Neel Nanda
171
218
0
17 Jun 2024
A Notion of Complexity for Theory of Mind via Discrete World Models
X. A. Huang
Emanuele La Malfa
Samuele Marro
Andrea Asperti
Anthony Cohn
Michael Wooldridge
95
8
0
16 Jun 2024
What Should Embeddings Embed? Autoregressive Models Represent Latent Generating Distributions
Liyi Zhang
Michael Y. Li
Thomas Griffiths
77
3
0
06 Jun 2024
Evaluating the World Model Implicit in a Generative Model
Keyon Vafa
Justin Y. Chen
Jon M. Kleinberg
S. Mullainathan
Ashesh Rambachan
166
41
0
06 Jun 2024
InversionView: A General-Purpose Method for Reading Information from Neural Activations
Xinting Huang
Madhur Panwar
Navin Goyal
Michael Hahn
101
5
0
27 May 2024
Implicit In-context Learning
Zhuowei Li
Zihao Xu
Ligong Han
Yunhe Gao
Song Wen
Di Liu
Hao Wang
Dimitris N. Metaxas
149
3
0
23 May 2024
Simulating Policy Impacts: Developing a Generative Scenario Writing Method to Evaluate the Perceived Effects of Regulation
Julia Barnett
Kimon Kieslich
Nicholas Diakopoulos
57
5
0
15 May 2024
Towards Principled Evaluations of Sparse Autoencoders for Interpretability and Control
Aleksandar Makelov
Georg Lange
Neel Nanda
79
41
0
14 May 2024
A Philosophical Introduction to Language Models - Part II: The Way Forward
Raphael Milliere
Cameron Buckner
LRM
124
15
0
06 May 2024
Mechanistic Interpretability for AI Safety -- A Review
Leonard Bereska
E. Gavves
AI4CE
139
158
0
22 Apr 2024
SelfIE: Self-Interpretation of Large Language Model Embeddings
Haozhe Chen
Carl Vondrick
Chengzhi Mao
67
27
0
16 Mar 2024
Towards a theory of model distillation
Enric Boix-Adserà
FedML
VLM
80
8
0
14 Mar 2024
Towards Tracing Trustworthiness Dynamics: Revisiting Pre-training Period of Large Language Models
Chao Qian
Jie Zhang
Wei Yao
Dongrui Liu
Zhen-fei Yin
Yu Qiao
Yong Liu
Jing Shao
LLMSV
LRM
98
12
0
29 Feb 2024
RAVEL: Evaluating Interpretability Methods on Disentangling Language Model Representations
Jing-ling Huang
Zhengxuan Wu
Christopher Potts
Mor Geva
Atticus Geiger
130
35
0
27 Feb 2024
What Do Language Models Hear? Probing for Auditory Representations in Language Models
Jerry Ngo
Yoon Kim
AuLLM
MILM
66
8
0
26 Feb 2024
Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking
Nikhil Prakash
Tamar Rott Shaham
Tal Haklay
Yonatan Belinkov
David Bau
99
67
0
22 Feb 2024
On the Tip of the Tongue: Analyzing Conceptual Representation in Large Language Models with Reverse-Dictionary Probe
Ningyu Xu
Qi Zhang
Menghan Zhang
Peng Qian
Xuanjing Huang
LRM
124
3
0
22 Feb 2024
Strong hallucinations from negation and how to fix them
Nicholas Asher
Swarnadeep Bhar
ReLM
LRM
54
5
0
16 Feb 2024
LAVE: LLM-Powered Agent Assistance and Language Augmentation for Video Editing
Bryan Wang
Yuliang Li
Zhaoyang Lv
Haijun Xia
Yan Xu
Raj Sodhi
92
53
0
15 Feb 2024
1
2
3
Next