ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2212.03827
  4. Cited By
Discovering Latent Knowledge in Language Models Without Supervision

Discovering Latent Knowledge in Language Models Without Supervision

7 December 2022
Collin Burns
Haotian Ye
Dan Klein
Jacob Steinhardt
ArXivPDFHTML

Papers citing "Discovering Latent Knowledge in Language Models Without Supervision"

50 / 269 papers shown
Title
Rescue: Ranking LLM Responses with Partial Ordering to Improve Response
  Generation
Rescue: Ranking LLM Responses with Partial Ordering to Improve Response Generation
Yikun Wang
Rui Zheng
Haoming Li
Qi Zhang
Tao Gui
Fei Liu
OffRL
25
3
0
15 Nov 2023
Towards Evaluating AI Systems for Moral Status Using Self-Reports
Towards Evaluating AI Systems for Moral Status Using Self-Reports
Ethan Perez
Robert Long
ELM
36
8
0
14 Nov 2023
A Survey of Confidence Estimation and Calibration in Large Language
  Models
A Survey of Confidence Estimation and Calibration in Large Language Models
Jiahui Geng
Fengyu Cai
Yuxia Wang
Heinz Koeppl
Preslav Nakov
Iryna Gurevych
UQCV
41
54
0
14 Nov 2023
Generalization Analogies: A Testbed for Generalizing AI Oversight to
  Hard-To-Measure Domains
Generalization Analogies: A Testbed for Generalizing AI Oversight to Hard-To-Measure Domains
Joshua Clymer
Garrett Baker
Rohan Subramani
Sam Wang
19
6
0
13 Nov 2023
In-context Vectors: Making In Context Learning More Effective and
  Controllable Through Latent Space Steering
In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering
Sheng Liu
Haotian Ye
Lei Xing
James Y. Zou
26
83
0
11 Nov 2023
A Survey on Hallucination in Large Language Models: Principles,
  Taxonomy, Challenges, and Open Questions
A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions
Lei Huang
Weijiang Yu
Weitao Ma
Weihong Zhong
Zhangyin Feng
...
Qianglong Chen
Weihua Peng
Xiaocheng Feng
Bing Qin
Ting Liu
LRM
HILM
39
718
0
09 Nov 2023
Training Dynamics of Contextual N-Grams in Language Models
Training Dynamics of Contextual N-Grams in Language Models
Lucia Quirke
Lovis Heindrich
Wes Gurnee
Neel Nanda
18
4
0
01 Nov 2023
Comparing Optimization Targets for Contrast-Consistent Search
Comparing Optimization Targets for Contrast-Consistent Search
Hugo Fry
S. Fallows
Ian Fan
Jamie Wright
Nandi Schoots
11
2
0
01 Nov 2023
The Expressibility of Polynomial based Attention Scheme
The Expressibility of Polynomial based Attention Scheme
Zhao-quan Song
Guangyi Xu
Junze Yin
32
5
0
30 Oct 2023
Personas as a Way to Model Truthfulness in Language Models
Personas as a Way to Model Truthfulness in Language Models
Nitish Joshi
Javier Rando
Abulhair Saparov
Najoung Kim
He He
HILM
20
27
0
27 Oct 2023
Implicit meta-learning may lead language models to trust more reliable
  sources
Implicit meta-learning may lead language models to trust more reliable sources
Dmitrii Krasheninnikov
Egor Krasheninnikov
Bruno Mlodozeniec
Tegan Maharaj
David M. Krueger
26
3
0
23 Oct 2023
Self-Consistency of Large Language Models under Ambiguity
Self-Consistency of Large Language Models under Ambiguity
Henning Bartsch
Ole Jorgensen
Domenic Rosati
Jason Hoelscher-Obermaier
Jacob Pfau
HILM
25
4
0
20 Oct 2023
Getting aligned on representational alignment
Getting aligned on representational alignment
Ilia Sucholutsky
Lukas Muttenthaler
Adrian Weller
Andi Peng
Andreea Bobu
...
Thomas Unterthiner
Andrew Kyle Lampinen
Klaus-Robert Muller
M. Toneva
Thomas L. Griffiths
58
74
0
18 Oct 2023
Understanding and Controlling a Maze-Solving Policy Network
Understanding and Controlling a Maze-Solving Policy Network
Ulisse Mini
Peli Grietzer
Mrinank Sharma
Austin Meek
M. MacDiarmid
Alexander Matt Turner
14
15
0
12 Oct 2023
Measuring Feature Sparsity in Language Models
Measuring Feature Sparsity in Language Models
Mingyang Deng
Lucas Tao
Joe Benton
21
1
0
11 Oct 2023
Teaching Language Models to Hallucinate Less with Synthetic Tasks
Teaching Language Models to Hallucinate Less with Synthetic Tasks
Erik Jones
Hamid Palangi
Clarisse Simoes
Varun Chandrasekaran
Subhabrata Mukherjee
Arindam Mitra
Ahmed Hassan Awadallah
Ece Kamar
HILM
21
24
0
10 Oct 2023
The Geometry of Truth: Emergent Linear Structure in Large Language Model
  Representations of True/False Datasets
The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets
Samuel Marks
Max Tegmark
HILM
102
169
0
10 Oct 2023
Towards Mitigating Hallucination in Large Language Models via
  Self-Reflection
Towards Mitigating Hallucination in Large Language Models via Self-Reflection
Ziwei Ji
Tiezheng Yu
Yan Xu
Nayeon Lee
Etsuko Ishii
Pascale Fung
HILM
11
55
0
10 Oct 2023
Language Models Represent Space and Time
Language Models Represent Space and Time
Wes Gurnee
Max Tegmark
35
141
0
03 Oct 2023
Benchmarking and Improving Generator-Validator Consistency of Language
  Models
Benchmarking and Improving Generator-Validator Consistency of Language Models
Xiang Lisa Li
Vaishnavi Shrivastava
Siyan Li
Tatsunori Hashimoto
Percy Liang
19
27
0
03 Oct 2023
Siamese Representation Learning for Unsupervised Relation Extraction
Siamese Representation Learning for Unsupervised Relation Extraction
Guangxin Zhang
Shu Chen
SSL
15
2
0
01 Oct 2023
Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of
  Language Models
Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of Language Models
Mert Yuksekgonul
Varun Chandrasekaran
Erik Jones
Suriya Gunasekar
Ranjita Naik
Hamid Palangi
Ece Kamar
Besmira Nushi
HILM
18
40
0
26 Sep 2023
How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking
  Unrelated Questions
How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions
Lorenzo Pacchiardi
A. J. Chan
Sören Mindermann
Ilan Moscovitz
Alexa Y. Pan
Y. Gal
Owain Evans
J. Brauner
LLMAG
HILM
22
48
0
26 Sep 2023
A Fast Optimization View: Reformulating Single Layer Attention in LLM
  Based on Tensor and SVM Trick, and Solving It in Matrix Multiplication Time
A Fast Optimization View: Reformulating Single Layer Attention in LLM Based on Tensor and SVM Trick, and Solving It in Matrix Multiplication Time
Yeqi Gao
Zhao-quan Song
Weixin Wang
Junze Yin
20
25
0
14 Sep 2023
Unsupervised Contrast-Consistent Ranking with Language Models
Unsupervised Contrast-Consistent Ranking with Language Models
Niklas Stoehr
Pengxiang Cheng
Jing Wang
Daniel Preotiuc-Pietro
Rajarshi Bhowmik
ALM
31
11
0
13 Sep 2023
Emergent Linear Representations in World Models of Self-Supervised
  Sequence Models
Emergent Linear Representations in World Models of Self-Supervised Sequence Models
Neel Nanda
Andrew Lee
Martin Wattenberg
FAtt
MILM
42
143
0
02 Sep 2023
Benchmarks for Detecting Measurement Tampering
Benchmarks for Detecting Measurement Tampering
Fabien Roger
Ryan Greenblatt
Max Nadeau
Buck Shlegeris
Nate Thomas
28
2
0
29 Aug 2023
AI Deception: A Survey of Examples, Risks, and Potential Solutions
AI Deception: A Survey of Examples, Risks, and Potential Solutions
Peter S. Park
Simon Goldstein
Aidan O'Gara
Michael Chen
Dan Hendrycks
27
140
0
28 Aug 2023
Situated Natural Language Explanations
Situated Natural Language Explanations
Zining Zhu
Hao Jiang
Jingfeng Yang
Sreyashi Nag
Chao Zhang
Jie Huang
Yifan Gao
Frank Rudzicz
Bing Yin
LRM
41
1
0
27 Aug 2023
GradientCoin: A Peer-to-Peer Decentralized Large Language Models
GradientCoin: A Peer-to-Peer Decentralized Large Language Models
Yeqi Gao
Zhao-quan Song
Junze Yin
23
18
0
21 Aug 2023
Deception Abilities Emerged in Large Language Models
Deception Abilities Emerged in Large Language Models
Thilo Hagendorff
LLMAG
35
75
0
31 Jul 2023
Can Large Language Models Aid in Annotating Speech Emotional Data?
  Uncovering New Frontiers
Can Large Language Models Aid in Annotating Speech Emotional Data? Uncovering New Frontiers
S. Latif
Muhammad Usama
Mohammad Ibrahim Malik
Björn W. Schuller
32
17
0
12 Jul 2023
Large Language Models
Large Language Models
Michael R Douglas
LLMAG
LM&MA
40
557
0
11 Jul 2023
Discovering Variable Binding Circuitry with Desiderata
Discovering Variable Binding Circuitry with Desiderata
Xander Davies
Max Nadeau
Nikhil Prakash
Tamar Rott Shaham
David Bau
25
12
0
07 Jul 2023
Hoodwinked: Deception and Cooperation in a Text-Based Game for Language
  Models
Hoodwinked: Deception and Cooperation in a Text-Based Game for Language Models
Aidan O'Gara
13
35
0
05 Jul 2023
Still No Lie Detector for Language Models: Probing Empirical and
  Conceptual Roadblocks
Still No Lie Detector for Language Models: Probing Empirical and Conceptual Roadblocks
B. Levinstein
Daniel A. Herrmann
17
54
0
30 Jun 2023
An Overview of Catastrophic AI Risks
An Overview of Catastrophic AI Risks
Dan Hendrycks
Mantas Mazeika
Thomas Woodside
SILM
26
165
0
21 Jun 2023
Evaluating Superhuman Models with Consistency Checks
Evaluating Superhuman Models with Consistency Checks
Lukas Fluri
Daniel Paleka
Florian Tramèr
ELM
42
42
0
16 Jun 2023
Explore, Establish, Exploit: Red Teaming Language Models from Scratch
Explore, Establish, Exploit: Red Teaming Language Models from Scratch
Stephen Casper
Jason Lin
Joe Kwon
Gatlen Culp
Dylan Hadfield-Menell
AAML
8
83
0
15 Jun 2023
Inference-Time Intervention: Eliciting Truthful Answers from a Language
  Model
Inference-Time Intervention: Eliciting Truthful Answers from a Language Model
Kenneth Li
Oam Patel
Fernanda Viégas
Hanspeter Pfister
Martin Wattenberg
KELM
HILM
26
475
0
06 Jun 2023
Encoding Time-Series Explanations through Self-Supervised Model Behavior
  Consistency
Encoding Time-Series Explanations through Self-Supervised Model Behavior Consistency
Owen Queen
Thomas Hartvigsen
Teddy Koker
Huan He
Theodoros Tsiligkaridis
Marinka Zitnik
AI4TS
37
17
0
03 Jun 2023
Incentivizing honest performative predictions with proper scoring rules
Incentivizing honest performative predictions with proper scoring rules
Caspar Oesterheld
Johannes Treutlein
Emery Cooper
Rubi Hudson
33
5
0
28 May 2023
Beyond Positive Scaling: How Negation Impacts Scaling Trends of Language
  Models
Beyond Positive Scaling: How Negation Impacts Scaling Trends of Language Models
Yuhui Zhang
Michihiro Yasunaga
Zhengping Zhou
Jeff Z. HaoChen
James Y. Zou
Percy Liang
Serena Yeung
44
7
0
27 May 2023
Language Models Implement Simple Word2Vec-style Vector Arithmetic
Language Models Implement Simple Word2Vec-style Vector Arithmetic
Jack Merullo
Carsten Eickhoff
Ellie Pavlick
KELM
26
52
0
25 May 2023
Towards Revealing the Mystery behind Chain of Thought: A Theoretical
  Perspective
Towards Revealing the Mystery behind Chain of Thought: A Theoretical Perspective
Guhao Feng
Bohang Zhang
Yuntian Gu
Haotian Ye
Di He
Liwei Wang
LRM
27
215
0
24 May 2023
Model evaluation for extreme risks
Model evaluation for extreme risks
Toby Shevlane
Sebastian Farquhar
Ben Garfinkel
Mary Phuong
Jess Whittlestone
...
Vijay Bolina
Jack Clark
Yoshua Bengio
Paul Christiano
Allan Dafoe
ELM
32
152
0
24 May 2023
A Mechanistic Interpretation of Arithmetic Reasoning in Language Models
  using Causal Mediation Analysis
A Mechanistic Interpretation of Arithmetic Reasoning in Language Models using Causal Mediation Analysis
Alessandro Stolfo
Yonatan Belinkov
Mrinmaya Sachan
MILM
KELM
LRM
33
47
0
24 May 2023
The Knowledge Alignment Problem: Bridging Human and External Knowledge
  for Large Language Models
The Knowledge Alignment Problem: Bridging Human and External Knowledge for Large Language Models
Shuo Zhang
Liangming Pan
Junzhou Zhao
W. Wang
HILM
26
0
0
23 May 2023
Language Models Don't Always Say What They Think: Unfaithful
  Explanations in Chain-of-Thought Prompting
Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting
Miles Turpin
Julian Michael
Ethan Perez
Sam Bowman
ReLM
LRM
27
379
0
07 May 2023
Finding Neurons in a Haystack: Case Studies with Sparse Probing
Finding Neurons in a Haystack: Case Studies with Sparse Probing
Wes Gurnee
Neel Nanda
Matthew Pauly
Katherine Harvey
Dmitrii Troitskii
Dimitris Bertsimas
MILM
160
186
0
02 May 2023
Previous
123456
Next