ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2501.12599
  4. Cited By
Kimi k1.5: Scaling Reinforcement Learning with LLMs

Kimi k1.5: Scaling Reinforcement Learning with LLMs

22 January 2025
Kimi Team
Angang Du
Bofei Gao
Bowei Xing
Changjiu Jiang
Cheng Chen
Cheng Li
Chenjun Xiao
C. Du
Chonghua Liao
C. Tang
C. Wang
Dehao Zhang
Enming Yuan
Enzhe Lu
Fengxiang Tang
Flood Sung
Guangda Wei
Guokun Lai
Haiqing Guo
Han Zhu
Hao Ding
Hao Hu
Hao Yang
Hao Zhang
Haotian Yao
Haotian Zhao
Haoyu Lu
Hao Li
Haozhen Yu
Hongcheng Gao
Huabin Zheng
Huan Yuan
Jia-Yu Chen
Jianhang Guo
Jianlin Su
J. Wang
J. Zhao
Jin Zhang
Jiaheng Liu
Junjie Yan
J. Wu
Lidong Shi
Ling Ye
L. Yu
Mengnan Dong
N. Zhang
Ningchen Ma
Qiwei Pan
Qucheng Gong
S. Liu
Shengling Ma
Shupeng Wei
Sihan Cao
S. Huang
Tao Jiang
W. Gao
Weimin Xiong
Weiran He
Yifan Jiang
Wei Wu
Wenyang He
Xianghui Wei
Xianqing Jia
Xingzhe Wu
Xinran Xu
Xinxing Zu
Xinyu Zhou
Xuehai Pan
Y. Charles
Yang Li
Yihan Hu
Yi Liu
Y. Chen
Yejie Wang
Yibo Liu
Yidao Qin
Y. Liu
Yiran Yang
Yiping Bao
Yulun Du
Yuxin Wu
Yuzhi Wang
Zaida Zhou
Zhilin Wang
Z. Li
Zhen Zhu
Zheng Zhang
Zhexu Wang
Zhilin Yang
Zhiqi Huang
Zihao Huang
Ziyao Xu
Z. Yang
    VLM
    ALM
    OffRL
    AI4TS
    LRM
ArXivPDFHTML

Papers citing "Kimi k1.5: Scaling Reinforcement Learning with LLMs"

50 / 110 papers shown
Title
Disentangling Reasoning and Knowledge in Medical Large Language Models
Disentangling Reasoning and Knowledge in Medical Large Language Models
Rahul Thapa
Qingyang Wu
Kevin Wu
Harrison Zhang
Angela Zhang
...
Joseph Boen
Shriya Reddy
Ben Athiwaratkun
Shuaiwen Leon Song
James Zou
ELM
AI4MH
LM&MA
LRM
24
0
0
16 May 2025
Can Global XAI Methods Reveal Injected Bias in LLMs? SHAP vs Rule Extraction vs RuleSHAP
Can Global XAI Methods Reveal Injected Bias in LLMs? SHAP vs Rule Extraction vs RuleSHAP
Francesco Sovrano
12
0
0
16 May 2025
Visual Planning: Let's Think Only with Images
Visual Planning: Let's Think Only with Images
Yi Xu
Chengzu Li
Han Zhou
Xingchen Wan
Caiqi Zhang
Anna Korhonen
Ivan Vulić
LM&Ro
LRM
7
0
0
16 May 2025
Patho-R1: A Multimodal Reinforcement Learning-Based Pathology Expert Reasoner
Patho-R1: A Multimodal Reinforcement Learning-Based Pathology Expert Reasoner
Wenchuan Zhang
Penghao Zhang
Jingru Guo
Tao Cheng
Jie Chen
Shuwan Zhang
Zhang Zhang
Yuhao Yi
Hong Bu
AI4TS
LRM
12
0
0
16 May 2025
HAPO: Training Language Models to Reason Concisely via History-Aware Policy Optimization
HAPO: Training Language Models to Reason Concisely via History-Aware Policy Optimization
Chengyu Huang
Zhengxin Zhang
Claire Cardie
LRM
9
0
0
16 May 2025
J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning
J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning
Chenxi Whitehouse
Tianlu Wang
Ping Yu
Xian Li
Jason Weston
Ilia Kulikov
Swarnadeep Saha
ALM
ELM
LRM
19
0
0
15 May 2025
Parallel Scaling Law for Language Models
Parallel Scaling Law for Language Models
Mouxiang Chen
Binyuan Hui
Zeyu Cui
Jiaxi Yang
Dayiheng Liu
Jianling Sun
Junyang Lin
Zhongxin Liu
MoE
LRM
37
0
0
15 May 2025
Learning Like Humans: Advancing LLM Reasoning Capabilities via Adaptive Difficulty Curriculum Learning and Expert-Guided Self-Reformulation
Learning Like Humans: Advancing LLM Reasoning Capabilities via Adaptive Difficulty Curriculum Learning and Expert-Guided Self-Reformulation
Enci Zhang
Xingang Yan
Wei Lin
Tianxiang Zhang
Qianchun Lu
LRM
28
0
0
13 May 2025
MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining
MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining
Xiaomi LLM-Core Team
Bingquan Xia
B. S.
Cici
Dawei Zhu
...
Yishuo Wang
Yue Yu
Zhenru Lin
Zhichao Song
Zihao Yue
MoE
ReLM
LRM
AI4CE
45
0
0
12 May 2025
Skywork-VL Reward: An Effective Reward Model for Multimodal Understanding and Reasoning
Skywork-VL Reward: An Effective Reward Model for Multimodal Understanding and Reasoning
Xiaokun Wang
Chris
Jiangbo Pei
Wei Shen
Yi Peng
...
Ai Jian
Tianyidan Xie
Xuchen Song
Yang Liu
Yahui Zhou
OffRL
LRM
28
0
0
12 May 2025
RoboOS: A Hierarchical Embodied Framework for Cross-Embodiment and Multi-Agent Collaboration
RoboOS: A Hierarchical Embodied Framework for Cross-Embodiment and Multi-Agent Collaboration
Huajie Tan
Xiaoshuai Hao
Minglan Lin
Pengwei Wang
Yaoxu Lyu
Mingyu Cao
Zhongyuan Wang
S. Zhang
LM&Ro
48
0
0
06 May 2025
Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL
Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL
Jiarui Yao
Yifan Hao
Hanning Zhang
Hanze Dong
Wei Xiong
Nan Jiang
Tong Zhang
LRM
62
0
0
05 May 2025
FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models
FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models
Zhouliang Yu
Ruotian Peng
Keyi Ding
Y. K. Li
Zhongyuan Peng
...
Huajian Xin
Yifan Jiang
Yandong Wen
Ge Zhang
Weiyang Liu
LRM
128
0
0
05 May 2025
Between Underthinking and Overthinking: An Empirical Study of Reasoning Length and correctness in LLMs
Between Underthinking and Overthinking: An Empirical Study of Reasoning Length and correctness in LLMs
Jinyan Su
Jennifer Healey
Preslav Nakov
Claire Cardie
LRM
150
0
0
30 Apr 2025
Phi-4-reasoning Technical Report
Phi-4-reasoning Technical Report
Marah Abdin
Sahaj Agarwal
Ahmed Hassan Awadallah
Vidhisha Balachandran
Harkirat Singh Behl
...
Vaishnavi Shrivastava
Vibhav Vineet
Yue Wu
Safoora Yousefi
Guoqing Zheng
ReLM
LRM
87
0
0
30 Apr 2025
Reinforced MLLM: A Survey on RL-Based Reasoning in Multimodal Large Language Models
Reinforced MLLM: A Survey on RL-Based Reasoning in Multimodal Large Language Models
Guanghao Zhou
Panjia Qiu
Cheng Chen
J. Wang
Zheming Yang
Jian Xu
Minghui Qiu
OffRL
LRM
58
0
0
30 Apr 2025
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Yiping Wang
Qing Yang
Zhiyuan Zeng
Liliang Ren
L. Liu
...
Jianfeng Gao
Weizhu Chen
S. Wang
Simon S. Du
Yelong Shen
OffRL
ReLM
LRM
118
4
0
29 Apr 2025
OpenTCM: A GraphRAG-Empowered LLM-based System for Traditional Chinese Medicine Knowledge Retrieval and Diagnosis
OpenTCM: A GraphRAG-Empowered LLM-based System for Traditional Chinese Medicine Knowledge Retrieval and Diagnosis
Jinglin He
Yunqi Guo
Lai Kwan Lam
Waikei Leung
Lixing He
Yuanan Jiang
Chi Chiu Wang
Guoliang Xing
Hongkai Chen
34
0
0
28 Apr 2025
Even Small Reasoners Should Quote Their Sources: Introducing the Pleias-RAG Model Family
Even Small Reasoners Should Quote Their Sources: Introducing the Pleias-RAG Model Family
Pierre-Carl Langlais
Pavel Chizhov
Mattia Nee
Carlos Rosas Hinostroza
Matthieu Delsart
Irène Girard
Othman Hicheur
Anastasia Stasenko
Ivan P. Yamshchikov
LRM
64
0
0
25 Apr 2025
Fast-Slow Thinking for Large Vision-Language Model Reasoning
Fast-Slow Thinking for Large Vision-Language Model Reasoning
W. L. Xiao
Leilei Gan
Weilong Dai
Wanggui He
Ziwei Huang
...
Fangxun Shu
Zhelun Yu
Peng Zhang
Hao Jiang
Fei Wu
ReLM
LRM
AI4CE
164
1
0
25 Apr 2025
Kimi-Audio Technical Report
Kimi-Audio Technical Report
KimiTeam
Ding Ding
Zeqian Ju
Yichong Leng
S. Liu
...
Z. Yang
Aoxiong Yin
Ruibin Yuan
Yuhang Zhang
Zaida Zhou
AuLLM
VLM
110
5
0
25 Apr 2025
Skywork R1V2: Multimodal Hybrid Reinforcement Learning for Reasoning
Skywork R1V2: Multimodal Hybrid Reinforcement Learning for Reasoning
Chris
Yichen Wei
Yi Peng
Xuben Wang
Weijie Qiu
...
Jianhao Zhang
Y. Hao
Xuchen Song
Yang Liu
Yahui Zhou
OffRL
AI4TS
SyDa
LRM
VLM
79
0
0
23 Apr 2025
Sparks of Tabular Reasoning via Text2SQL Reinforcement Learning
Sparks of Tabular Reasoning via Text2SQL Reinforcement Learning
Josefa Lia Stoisser
Marc Boubnovski Martell
Julien Fauqueur
LMTD
ReLM
AI4TS
LRM
80
0
0
23 Apr 2025
StreamRL: Scalable, Heterogeneous, and Elastic RL for LLMs with Disaggregated Stream Generation
StreamRL: Scalable, Heterogeneous, and Elastic RL for LLMs with Disaggregated Stream Generation
Yinmin Zhong
Zili Zhang
Xiaoniu Song
Hanpeng Hu
Chao Jin
...
Changyi Wan
Hongyu Zhou
Yimin Jiang
Yibo Zhu
Daxin Jiang
OffRL
AI4TS
57
0
0
22 Apr 2025
SARI: Structured Audio Reasoning via Curriculum-Guided Reinforcement Learning
SARI: Structured Audio Reasoning via Curriculum-Guided Reinforcement Learning
Cheng Wen
Tingwei Guo
Shuaijiang Zhao
Wei Zou
Xiangang Li
OffRL
AuLLM
LRM
56
2
0
22 Apr 2025
Dynamic Early Exit in Reasoning Models
Dynamic Early Exit in Reasoning Models
Chenxu Yang
Qingyi Si
Yongjie Duan
Zheliang Zhu
Chenyu Zhu
Zheng-Shen Lin
Zheng Lin
Li Cao
Weiping Wang
ReLM
LRM
34
0
0
22 Apr 2025
Learning to Reason under Off-Policy Guidance
Learning to Reason under Off-Policy Guidance
Jianhao Yan
Yafu Li
Zican Hu
Zhi Wang
Ganqu Cui
Xiaoye Qu
Yu Cheng
Yue Zhang
OffRL
LRM
44
0
0
21 Apr 2025
Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning
Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning
Jie Cheng
Ruixi Qiao
Lijun Li
Chao Guo
J. Z. Wang
Gang Xiong
Yisheng Lv
Fei-Yue Wang
LRM
154
1
0
21 Apr 2025
Think2SQL: Reinforce LLM Reasoning Capabilities for Text2SQL
Think2SQL: Reinforce LLM Reasoning Capabilities for Text2SQL
Simone Papicchio
Simone Rossi
Luca Cagliero
Paolo Papotti
ReLM
LMTD
AI4TS
LRM
53
0
0
21 Apr 2025
FlowReasoner: Reinforcing Query-Level Meta-Agents
FlowReasoner: Reinforcing Query-Level Meta-Agents
Hongcheng Gao
Yue Liu
Yufei He
Longxu Dou
C. Du
Zhijie Deng
Bryan Hooi
Min Lin
Tianyu Pang
AIFin
LRM
29
1
0
21 Apr 2025
VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models
VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models
Weiye Xu
Jun Wang
Weiyun Wang
Zhe Chen
Wengang Zhou
...
Xiaohua Wang
Xizhou Zhu
Wenhai Wang
Jifeng Dai
Jinguo Zhu
VLM
LRM
55
1
0
21 Apr 2025
SRPO: A Cross-Domain Implementation of Large-Scale Reinforcement Learning on LLM
SRPO: A Cross-Domain Implementation of Large-Scale Reinforcement Learning on LLM
X. Zhang
J. Wang
Zifei Cheng
Wenhao Zhuang
Zheng Lin
...
Shouyu Yin
Chaohang Wen
Haotian Zhang
Bin Chen
Bing Yu
LRM
40
2
0
19 Apr 2025
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
Yang Yue
Zhiqi Chen
Rui Lu
Andrew Zhao
Zhaokai Wang
Yang Yue
Shiji Song
Gao Huang
ReLM
LRM
58
12
0
18 Apr 2025
Thought Manipulation: External Thought Can Be Efficient for Large Reasoning Models
Thought Manipulation: External Thought Can Be Efficient for Large Reasoning Models
Yule Liu
Jingyi Zheng
Zhen Sun
Zifan Peng
Wenhan Dong
Zeyang Sha
Shiwen Cui
Weiqiang Wang
Xinlei He
OffRL
LRM
42
4
0
18 Apr 2025
ImPart: Importance-Aware Delta-Sparsification for Improved Model Compression and Merging in LLMs
ImPart: Importance-Aware Delta-Sparsification for Improved Model Compression and Merging in LLMs
Yan Yang
Yixia Li
Hongru Wang
Xuetao Wei
Jianqiao Yu
Yun-Nung Chen
Guanhua Chen
MoMe
28
0
0
17 Apr 2025
Open-Medical-R1: How to Choose Data for RLVR Training at Medicine Domain
Open-Medical-R1: How to Choose Data for RLVR Training at Medicine Domain
Zhongxi Qiu
Zhang Zhang
Yan Hu
Heng Li
Jiang-Dong Liu
OffRL
149
0
0
16 Apr 2025
d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning
d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning
Siyan Zhao
Devaansh Gupta
Qinqing Zheng
Aditya Grover
DiffM
LRM
AI4CE
42
2
0
16 Apr 2025
Efficient Reasoning Models: A Survey
Efficient Reasoning Models: A Survey
Sicheng Feng
Gongfan Fang
Xinyin Ma
Xinchao Wang
ReLM
LRM
145
0
0
15 Apr 2025
Kimina-Prover Preview: Towards Large Formal Reasoning Models with Reinforcement Learning
Kimina-Prover Preview: Towards Large Formal Reasoning Models with Reinforcement Learning
Haiming Wang
Mert Unsal
Xiaohan Lin
Mantas Baksys
Jiaheng Liu
...
Zhouliang Yu
Zhilin Wang
Zhilin Yang
Zhengying Liu
Jia-Nan Li
AIMat
ReLM
AI4TS
LRM
54
5
0
15 Apr 2025
Heimdall: test-time scaling on the generative verification
Heimdall: test-time scaling on the generative verification
Wenlei Shi
Xing Jin
LRM
29
0
0
14 Apr 2025
DUMP: Automated Distribution-Level Curriculum Learning for RL-based LLM Post-training
DUMP: Automated Distribution-Level Curriculum Learning for RL-based LLM Post-training
Zhenting Wang
Guofeng Cui
Kun Wan
Wentian Zhao
35
1
0
13 Apr 2025
TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning
TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning
Xingjian Zhang
Siwei Wen
Wenjun Wu
Lei Huang
LRM
37
1
0
13 Apr 2025
Speculative Thinking: Enhancing Small-Model Reasoning with Large Model Guidance at Inference Time
Speculative Thinking: Enhancing Small-Model Reasoning with Large Model Guidance at Inference Time
Wang Yang
Xiang Yue
V. Chaudhary
Xiaotian Han
ReLM
LRM
72
1
0
12 Apr 2025
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning
Haozhe Wang
C. Qu
Zuming Huang
Wei Chu
Fangzhen Lin
Wenhu Chen
OffRL
ReLM
SyDa
LRM
VLM
72
1
0
10 Apr 2025
Perception-R1: Pioneering Perception Policy with Reinforcement Learning
Perception-R1: Pioneering Perception Policy with Reinforcement Learning
En Yu
Kangheng Lin
Liang Zhao
Jisheng Yin
Yana Wei
...
Zheng Ge
Xiangyu Zhang
Daxin Jiang
Jingyu Wang
Wenbing Tao
VLM
OffRL
LRM
40
2
0
10 Apr 2025
Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining
Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining
Rosie Zhao
Alexandru Meterez
Sham Kakade
C. Pehlevan
Samy Jelassi
Eran Malach
ReLM
LRM
112
2
0
10 Apr 2025
Kimi-VL Technical Report
Kimi-VL Technical Report
Kimi Team
Angang Du
B. Yin
Bowei Xing
Bowen Qu
...
Zhiqi Huang
Zihao Huang
Zijia Zhao
Zhengzhang Chen
Zongyu Lin
MLLM
VLM
MoE
204
2
0
10 Apr 2025
A Sober Look at Progress in Language Model Reasoning: Pitfalls and Paths to Reproducibility
A Sober Look at Progress in Language Model Reasoning: Pitfalls and Paths to Reproducibility
Andreas Hochlehnert
Hardik Bhatnagar
Vishaal Udandarao
Samuel Albanie
Ameya Prabhu
Matthias Bethge
ReLM
ALM
LRM
100
4
0
09 Apr 2025
VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning
VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning
Xinhao Li
Ziang Yan
Desen Meng
Lu Dong
Xiangyu Zeng
Yinan He
Yishuo Wang
Yu Qiao
Yi Wang
Limin Wang
VLM
AI4TS
LRM
43
3
0
09 Apr 2025
On the Suitability of Reinforcement Fine-Tuning to Visual Tasks
On the Suitability of Reinforcement Fine-Tuning to Visual Tasks
X. Chen
Wei Li
Chunxu Liu
Chi Xie
Xiaoyan Hu
Chengqian Ma
Feng Zhu
Rui Zhao
ReLM
LRM
54
0
0
08 Apr 2025
123
Next