Title |
---|
![]() Multi-turn Reinforcement Learning from Preference Human Feedback Lior Shani Aviv Rosenberg Asaf B. Cassel Oran Lang Daniele Calandriello ...Bilal Piot Idan Szpektor Avinatan Hassidim Yossi Matias Rémi Munos |
![]() A Survey on Self-Evolution of Large Language Models Zhengwei Tao Ting-En Lin Xiancai Chen Hangyu Li Yuchuan Wu Yongbin Li Zhi Jin Fei Huang Dacheng Tao Jingren Zhou |