Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2307.09009
Cited By
How is ChatGPT's behavior changing over time?
18 July 2023
Lingjiao Chen
Matei A. Zaharia
James Y. Zou
ELM
KELM
AI4MH
Re-assign community
ArXiv
PDF
HTML
Papers citing
"How is ChatGPT's behavior changing over time?"
48 / 48 papers shown
Title
Improving Model Alignment Through Collective Intelligence of Open-Source LLMS
Junlin Wang
Roy Xie
Shang Zhu
Jue Wang
Ben Athiwaratkun
Bhuwan Dhingra
S. Song
Ce Zhang
James Y. Zou
ALM
31
0
0
05 May 2025
Bye-bye, Bluebook? Automating Legal Procedure with Large Language Models
Matthew Dahl
AILaw
ELM
54
0
0
05 May 2025
Memorization and Knowledge Injection in Gated LLMs
Xu Pan
Ely Hahami
Zechen Zhang
H. Sompolinsky
KELM
CLL
RALM
104
1
0
30 Apr 2025
LLM-Evaluation Tropes: Perspectives on the Validity of LLM-Evaluations
Laura Dietz
Oleg Zendel
P. Bailey
Charles L. A. Clarke
Ellese Cotterill
Jeff Dalton
Faegheh Hasibi
Mark Sanderson
Nick Craswell
ELM
48
0
0
27 Apr 2025
Improving LLM Personas via Rationalization with Psychological Scaffolds
Brihi Joshi
Xiang Ren
Swabha Swayamdipta
Rik Koncel-Kedziorski
Tim Paek
73
0
0
25 Apr 2025
Testing LLMs' Capabilities in Annotating Translations Based on an Error Typology Designed for LSP Translation: First Experiments with ChatGPT
Joachim Minder
Guillaume Wisniewski
Natalie Kübler
28
0
0
21 Apr 2025
Assessing how hyperparameters impact Large Language Models' sarcasm detection performance
Montgomery Gole
Andriy Miranskyy
AI4MH
21
0
0
08 Apr 2025
Generalization Bias in Large Language Model Summarization of Scientific Research
Uwe Peters
Benjamin Chin-Yee
ELM
34
0
0
28 Mar 2025
RobuNFR: Evaluating the Robustness of Large Language Models on Non-Functional Requirements Aware Code Generation
Feng Lin
Dong Jae Kim
Z. Li
Jinqiu Yang
Tse-Husn
Chen
AAML
38
0
0
28 Mar 2025
I'm Sorry Dave: How the old world of personnel security can inform the new world of AI insider risk
Paul Martin
Sarah Mercer
161
0
0
26 Mar 2025
Demonstrating specification gaming in reasoning models
Alexander Bondarenko
Denis Volk
Dmitrii Volkov
Jeffrey Ladish
LRM
LLMAG
44
3
0
18 Feb 2025
Unlocking the Power of Function Vectors for Characterizing and Mitigating Catastrophic Forgetting in Continual Instruction Tuning
Gangwei Jiang
Caigao Jiang
Zhaoyi Li
Siqiao Xue
Jun-ping Zhou
Linqi Song
Defu Lian
Yin Wei
CLL
MU
60
0
0
16 Feb 2025
The Cake that is Intelligence and Who Gets to Bake it: An AI Analogy and its Implications for Participation
Martin Mundt
Anaelia Ovalle
Felix Friedrich
A Pranav
Subarnaduti Paul
Manuel Brack
Kristian Kersting
William Agnew
273
0
0
05 Feb 2025
PROMPTFUZZ: Harnessing Fuzzing Techniques for Robust Testing of Prompt Injection in LLMs
Jiahao Yu
Yangguang Shao
Hanwen Miao
Junzheng Shi
SILM
AAML
69
4
0
23 Sep 2024
Prompts Are Programs Too! Understanding How Developers Build Software Containing Prompts
Jenny T Liang
Melissa Lin
Nikitha Rao
Brad A. Myers
75
5
0
19 Sep 2024
A Systematic Survey and Critical Review on Evaluating Large Language Models: Challenges, Limitations, and Recommendations
Md Tahmid Rahman Laskar
Sawsan Alqahtani
M Saiful Bari
Mizanur Rahman
Mohammad Abdullah Matin Khan
...
Chee Wei Tan
Md. Rizwan Parvez
Enamul Hoque
Shafiq R. Joty
Jimmy Huang
ELM
ALM
29
27
0
04 Jul 2024
Cascade Reward Sampling for Efficient Decoding-Time Alignment
Bolian Li
Yifan Wang
A. Grama
Ruqi Zhang
Ruqi Zhang
AI4TS
49
9
0
24 Jun 2024
Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools
Varun Magesh
Faiz Surani
Matthew Dahl
Mirac Suzgun
Christopher D. Manning
Daniel E. Ho
HILM
ELM
AILaw
27
66
0
30 May 2024
CHARP: Conversation History AwaReness Probing for Knowledge-grounded Dialogue Systems
Abbas Ghaddar
David Alfonso-Hermelo
Philippe Langlais
Mehdi Rezagholizadeh
Boxing Chen
Prasanna Parthasarathi
36
0
0
24 May 2024
What is it for a Machine Learning Model to Have a Capability?
Jacqueline Harding
Nathaniel Sharadin
ELM
38
3
0
14 May 2024
"They are uncultured": Unveiling Covert Harms and Social Threats in LLM Generated Conversations
Preetam Prabhu Srikar Dammu
Hayoung Jung
Anjali Singh
Monojit Choudhury
Tanushree Mitra
32
8
0
08 May 2024
"ChatGPT Is Here to Help, Not to Replace Anybody" -- An Evaluation of Students' Opinions On Integrating ChatGPT In CS Courses
Bruno Pereira Cipriano
P. Alves
39
9
0
26 Apr 2024
Measuring Social Norms of Large Language Models
Ye Yuan
Kexin Tang
Jianhao Shen
Ming Zhang
Chenguang Wang
ELM
30
6
0
03 Apr 2024
Dialectical Alignment: Resolving the Tension of 3H and Security Threats of LLMs
Shu Yang
Jiayuan Su
Han Jiang
Mengdi Li
Keyuan Cheng
Muhammad Asif Ali
Lijie Hu
Di Wang
35
5
0
30 Mar 2024
Designing Informative Metrics for Few-Shot Example Selection
Rishabh Adiga
Lakshminarayanan Subramanian
Varun Chandrasekaran
32
1
0
06 Mar 2024
Leak, Cheat, Repeat: Data Contamination and Evaluation Malpractices in Closed-Source LLMs
Simone Balloccu
Patrícia Schmidtová
Mateusz Lango
Ondrej Dusek
SILM
ELM
PILM
21
156
0
06 Feb 2024
Evaluation of LLM Chatbots for OSINT-based Cyber Threat Awareness
Samaneh Shafee
A. Bessani
Pedro M. Ferreira
26
19
0
26 Jan 2024
Stability Analysis of ChatGPT-based Sentiment Analysis in AI Quality Assurance
Tinghui Ouyang
AprilPyone Maungmaung
Koichi Konishi
Yoshiki Seo
Isao Echizen
AI4MH
21
5
0
15 Jan 2024
Evaluating Language Model Agency through Negotiations
Tim R. Davidson
V. Veselovsky
Martin Josifoski
Maxime Peyrard
Antoine Bosselut
Michal Kosinski
Robert West
LLMAG
34
22
0
09 Jan 2024
ChatGPT & Mechanical Engineering: Examining performance on the FE Mechanical Engineering and Undergraduate Exams
Matthew Frenkel
Hebah Emara
26
2
0
26 Sep 2023
Watch Your Language: Investigating Content Moderation with Large Language Models
Deepak Kumar
Y. AbuHashem
Zakir Durumeric
AI4MH
33
15
0
25 Sep 2023
Trustworthy and Synergistic Artificial Intelligence for Software Engineering: Vision and Roadmaps
David Lo
34
39
0
08 Sep 2023
Prompting or Fine-tuning? A Comparative Study of Large Language Models for Taxonomy Construction
Boqi Chen
Fandi Yi
Dániel Varró
29
16
0
04 Sep 2023
Position: Key Claims in LLM Research Have a Long Tail of Footnotes
Anna Rogers
A. Luccioni
48
19
0
14 Aug 2023
GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via Cipher
Youliang Yuan
Wenxiang Jiao
Wenxuan Wang
Jen-tse Huang
Pinjia He
Shuming Shi
Zhaopeng Tu
SILM
70
231
0
12 Aug 2023
Assessing Student Errors in Experimentation Using Artificial Intelligence and Large Language Models: A Comparative Study with Human Raters
Arne Bewersdorff
Kathrin Seßler
Armin Baur
Enkelejda Kasneci
Claudia Nerdel
16
37
0
11 Aug 2023
Deception Abilities Emerged in Large Language Models
Thilo Hagendorff
LLMAG
35
75
0
31 Jul 2023
How Language Model Hallucinations Can Snowball
Muru Zhang
Ofir Press
William Merrill
Alisa Liu
Noah A. Smith
HILM
LRM
82
253
0
22 May 2023
A Survey of Safety and Trustworthiness of Large Language Models through the Lens of Verification and Validation
Xiaowei Huang
Wenjie Ruan
Wei Huang
Gao Jin
Yizhen Dong
...
Sihao Wu
Peipei Xu
Dengyu Wu
André Freitas
Mustafa A. Mustafa
ALM
34
82
0
19 May 2023
Semantically Aligned Task Decomposition in Multi-Agent Reinforcement Learning
Wenhao Li
Dan Qiao
Baoxiang Wang
Xiangfeng Wang
Bo Jin
H. Zha
35
5
0
18 May 2023
ChatLog: Carefully Evaluating the Evolution of ChatGPT Across Time
Shangqing Tu
Chunyang Li
Jifan Yu
Xiaozhi Wang
Lei Hou
Juanzi Li
LLMAG
AI4MH
75
10
0
27 Apr 2023
Can we trust the evaluation on ChatGPT?
Rachith Aiyappa
Jisun An
Haewoon Kwak
Yong-Yeol Ahn
ELM
ALM
LLMAG
AI4MH
LRM
117
87
0
22 Mar 2023
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sébastien Bubeck
Varun Chandrasekaran
Ronen Eldan
J. Gehrke
Eric Horvitz
...
Scott M. Lundberg
Harsha Nori
Hamid Palangi
Marco Tulio Ribeiro
Yi Zhang
ELM
AI4MH
AI4CE
ALM
289
3,003
0
22 Mar 2023
NormSAGE: Multi-Lingual Multi-Cultural Norm Discovery from Conversations On-the-Fly
Yi Ren Fung
Tuhin Chakraborty
Hao Guo
Owen Rambow
Smaranda Muresan
Heng Ji
21
39
0
16 Oct 2022
HAPI: A Large-scale Longitudinal Dataset of Commercial ML API Predictions
Lingjiao Chen
Zhihua Jin
Sabri Eyuboglu
Christopher Ré
Matei A. Zaharia
James Y. Zou
48
9
0
18 Sep 2022
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Deep Ganguli
Liane Lovitt
John Kernion
Amanda Askell
Yuntao Bai
...
Nicholas Joseph
Sam McCandlish
C. Olah
Jared Kaplan
Jack Clark
225
444
0
23 Aug 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
358
8,495
0
28 Jan 2022
Extracting Training Data from Large Language Models
Nicholas Carlini
Florian Tramèr
Eric Wallace
Matthew Jagielski
Ariel Herbert-Voss
...
Tom B. Brown
D. Song
Ulfar Erlingsson
Alina Oprea
Colin Raffel
MLAU
SILM
290
1,814
0
14 Dec 2020
1