Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.19465
Cited By
Towards Tracing Trustworthiness Dynamics: Revisiting Pre-training Period of Large Language Models
29 February 2024
Chao Qian
Jie Zhang
Wei Yao
Dongrui Liu
Zhen-fei Yin
Yu Qiao
Yong Liu
Jing Shao
LLMSV
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Towards Tracing Trustworthiness Dynamics: Revisiting Pre-training Period of Large Language Models"
17 / 17 papers shown
Title
InferAligner: Inference-Time Alignment for Harmlessness through Cross-Model Guidance
Pengyu Wang
Dong Zhang
Linyang Li
Chenkun Tan
Xinghao Wang
Ke Ren
Botian Jiang
Xipeng Qiu
LLMSV
49
45
0
20 Jan 2024
LLM360: Towards Fully Transparent Open-Source LLMs
Zhengzhong Liu
Aurick Qiao
Willie Neiswanger
Hongyi Wang
Bowen Tan
...
Zhiting Hu
Mark Schulze
Preslav Nakov
Timothy Baldwin
Eric Xing
91
75
0
11 Dec 2023
An Emulator for Fine-Tuning Large Language Models using Small Language Models
Eric Mitchell
Rafael Rafailov
Archit Sharma
Chelsea Finn
Christopher D. Manning
ALM
59
55
0
19 Oct 2023
Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision
Zhiqing Sun
Songlin Yang
Qinhong Zhou
Hongxin Zhang
Zhenfang Chen
David D. Cox
Yiming Yang
Chuang Gan
SyDa
ALM
60
329
0
04 May 2023
On the Robustness of ChatGPT: An Adversarial and Out-of-distribution Perspective
Jindong Wang
Xixu Hu
Wenxin Hou
Hao Chen
Runkai Zheng
...
Weirong Ye
Xiubo Geng
Binxing Jiao
Yue Zhang
Xingxu Xie
AI4MH
90
227
0
22 Feb 2023
Discovering Latent Knowledge in Language Models Without Supervision
Collin Burns
Haotian Ye
Dan Klein
Jacob Steinhardt
109
350
0
07 Dec 2022
A Closer Look at How Fine-tuning Changes BERT
Yichu Zhou
Vivek Srikumar
37
67
0
27 Jun 2021
Information Bottleneck: Exact Analysis of (Quantized) Neural Networks
S. Lorenzen
Christian Igel
M. Nielsen
MQ
33
18
0
24 Jun 2021
Probing Classifiers: Promises, Shortcomings, and Advances
Yonatan Belinkov
243
427
0
24 Feb 2021
Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning
Armen Aghajanyan
Luke Zettlemoyer
Sonal Gupta
82
549
1
22 Dec 2020
To be Robust or to be Fair: Towards Fairness in Adversarial Training
Han Xu
Xiaorui Liu
Yaxin Li
Anil K. Jain
Jiliang Tang
41
179
0
13 Oct 2020
The Information Bottleneck Problem and Its Applications in Machine Learning
Ziv Goldfeld
Yury Polyanskiy
41
133
0
30 Apr 2020
What Happens To BERT Embeddings During Fine-tuning?
Amil Merchant
Elahe Rahimtoroghi
Ellie Pavlick
Ian Tenney
57
186
0
29 Apr 2020
How Does BERT Answer Questions? A Layer-Wise Analysis of Transformer Representations
Betty van Aken
B. Winter
Alexander Loser
Felix Alexander Gers
50
153
0
11 Sep 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
AIMat
467
24,160
0
26 Jul 2019
What do you learn from context? Probing for sentence structure in contextualized word representations
Ian Tenney
Patrick Xia
Berlin Chen
Alex Jinpeng Wang
Adam Poliak
...
Najoung Kim
Benjamin Van Durme
Samuel R. Bowman
Dipanjan Das
Ellie Pavlick
163
853
0
15 May 2019
Deep Learning and the Information Bottleneck Principle
Naftali Tishby
Noga Zaslavsky
DRL
145
1,570
0
09 Mar 2015
1