ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2308.13149
  4. Cited By
SciEval: A Multi-Level Large Language Model Evaluation Benchmark for
  Scientific Research

SciEval: A Multi-Level Large Language Model Evaluation Benchmark for Scientific Research

25 August 2023
Liangtai Sun
Yang Han
Zihan Zhao
Da Ma
Zhe-Wei Shen
Baocai Chen
Lu Chen
Kai Yu
    ELM
ArXivPDFHTML

Papers citing "SciEval: A Multi-Level Large Language Model Evaluation Benchmark for Scientific Research"

49 / 49 papers shown
Title
MatTools: Benchmarking Large Language Models for Materials Science Tools
MatTools: Benchmarking Large Language Models for Materials Science Tools
Siyu Liu
Jiamin Xu
Beilin Ye
Bo Hu
David J. Srolovitz
Tongqi Wen
12
0
0
16 May 2025
CellVerse: Do Large Language Models Really Understand Cell Biology?
CellVerse: Do Large Language Models Really Understand Cell Biology?
Fan Zhang
Tianyu Liu
Zhihong Zhu
Hao Wu
H. Wang
Donghao Zhou
Yefeng Zheng
Kun Wang
X. Wu
Pheng-Ann Heng
ELM
36
0
0
09 May 2025
Can LLMs Generate Tabular Summaries of Science Papers? Rethinking the Evaluation Protocol
Can LLMs Generate Tabular Summaries of Science Papers? Rethinking the Evaluation Protocol
Weiqi Wang
Jiefu Ou
Y. Song
Benjamin Van Durme
Daniel Khashabi
LMTD
42
0
0
14 Apr 2025
Towards Scientific Intelligence: A Survey of LLM-based Scientific Agents
Towards Scientific Intelligence: A Survey of LLM-based Scientific Agents
Shuo Ren
Pu Jian
Zhenjiang Ren
Chunlin Leng
Can Xie
Jiajun Zhang
LLMAG
AI4CE
59
1
0
31 Mar 2025
SciHorizon: Benchmarking AI-for-Science Readiness from Scientific Data to Large Language Models
SciHorizon: Benchmarking AI-for-Science Readiness from Scientific Data to Large Language Models
Chuan Qin
X. Chen
Chengrui Wang
Pengmin Wu
Xi Chen
...
Han Wu
C. Li
Yuanchun Zhou
H. Xiong
Hengshu Zhu
ELM
60
1
0
12 Mar 2025
Recent Advances in Large Langauge Model Benchmarks against Data Contamination: From Static to Dynamic Evaluation
Recent Advances in Large Langauge Model Benchmarks against Data Contamination: From Static to Dynamic Evaluation
Simin Chen
Yiming Chen
Zexin Li
Yifan Jiang
Zhongwei Wan
...
Dezhi Ran
Tianle Gu
H. Li
Tao Xie
Baishakhi Ray
45
3
0
23 Feb 2025
Curie: Toward Rigorous and Automated Scientific Experimentation with AI Agents
Curie: Toward Rigorous and Automated Scientific Experimentation with AI Agents
Patrick Tser Jern Kon
Jiachen Liu
Qiuyi Ding
Yiming Qiu
Zhenning Yang
Yibo Huang
Jayanth Srinivasa
Myungjin Lee
Mosharaf Chowdhury
Ang Chen
56
3
0
22 Feb 2025
NOTA: Multimodal Music Notation Understanding for Visual Large Language Model
NOTA: Multimodal Music Notation Understanding for Visual Large Language Model
Mingni Tang
Jiajia Li
Lu Yang
Zhiqiang Zhang
Jinghao Tian
Z. Li
L. Zhang
P. Wang
56
0
0
17 Feb 2025
PhysReason: A Comprehensive Benchmark towards Physics-Based Reasoning
PhysReason: A Comprehensive Benchmark towards Physics-Based Reasoning
X. Zhang
Yuxuan Dong
Y. Wu
Jiaxing Huang
Chengyou Jia
Basura Fernando
Mike Zheng Shou
L. Zhang
Jun Liu
AIMat
ReLM
LRM
53
2
0
17 Feb 2025
Position: Multimodal Large Language Models Can Significantly Advance Scientific Reasoning
Position: Multimodal Large Language Models Can Significantly Advance Scientific Reasoning
Yibo Yan
Shen Wang
Jiahao Huo
Jingheng Ye
Zhendong Chu
Xuming Hu
Philip S. Yu
Carla P. Gomes
B. Selman
Qingsong Wen
LRM
127
9
0
05 Feb 2025
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding
Yilun Zhao
Lujing Xie
Haowei Zhang
Guo Gan
Yitao Long
...
Xiangru Tang
Zhenwen Liang
Y. Liu
Chen Zhao
Arman Cohan
53
5
0
21 Jan 2025
CS-Eval: A Comprehensive Large Language Model Benchmark for CyberSecurity
CS-Eval: A Comprehensive Large Language Model Benchmark for CyberSecurity
Zhengmin Yu
Jiutian Zeng
Siyi Chen
Wenhan Xu
Dandan Xu
Xiangyu Liu
Zonghao Ying
Nan Wang
Yuan Zhang
Min Yang
ELM
108
1
0
20 Jan 2025
ChemAgent: Self-updating Library in Large Language Models Improves Chemical Reasoning
ChemAgent: Self-updating Library in Large Language Models Improves Chemical Reasoning
Xiangru Tang
Tianyu Hu
Muyang Ye
Yanjun Shao
Xunjian Yin
...
Pan Lu
Zhuosheng Zhang
Yilun Zhao
Arman Cohan
Mark B. Gerstein
LLMAG
LRM
AI4CE
66
6
0
11 Jan 2025
MULTI: Multimodal Understanding Leaderboard with Text and Images
MULTI: Multimodal Understanding Leaderboard with Text and Images
Zichen Zhu
Yang Xu
Lu Chen
Jingkai Yang
Yichuan Ma
...
Yingzi Ma
Situo Zhang
Zihan Zhao
Liangtai Sun
Kai Yu
VLM
54
5
0
08 Jan 2025
MetaScientist: A Human-AI Synergistic Framework for Automated Mechanical
  Metamaterial Design
MetaScientist: A Human-AI Synergistic Framework for Automated Mechanical Metamaterial Design
Jingyuan Qi
Z. Jia
Minqian Liu
Wangzhi Zhan
Junkai Zhang
...
Muhao Chen
Dawei Zhou
Ling Li
Wei Wang
Lifu Huang
AI4CE
82
1
0
20 Dec 2024
Improving Physics Reasoning in Large Language Models Using Mixture of
  Refinement Agents
Improving Physics Reasoning in Large Language Models Using Mixture of Refinement Agents
Raj Jaiswal
Dhruv Jain
Harsh Parimal Popat
Avinash Anand
Abhishek Dharmadhikari
Atharva Marathe
R. Shah
LRM
AI4CE
90
3
0
01 Dec 2024
VersaTune: An Efficient Data Composition Framework for Training Multi-Capability LLMs
Keer Lu
Keshi Zhao
Zheng Liang
Da Pan
Shusen Zhang
...
Weipeng Chen
Zenan Zhou
Guosheng Dong
Bin Cui
Wentao Zhang
VLM
28
0
0
18 Nov 2024
LabSafety Bench: Benchmarking LLMs on Safety Issues in Scientific Labs
LabSafety Bench: Benchmarking LLMs on Safety Issues in Scientific Labs
Yujun Zhou
Jingdong Yang
Kehan Guo
Pin-Yu Chen
Tian Gao
...
Tian Gao
Werner Geyer
Nuno Moniz
Nitesh V Chawla
Xiangliang Zhang
37
5
0
18 Oct 2024
Polymath: A Challenging Multi-modal Mathematical Reasoning Benchmark
Polymath: A Challenging Multi-modal Mathematical Reasoning Benchmark
Himanshu Gupta
Shreyas Verma
Ujjwala Anantheswaran
Kevin Scaria
Mihir Parmar
Swaroop Mishra
Chitta Baral
ReLM
LRM
32
4
0
06 Oct 2024
ErrorRadar: Benchmarking Complex Mathematical Reasoning of Multimodal
  Large Language Models Via Error Detection
ErrorRadar: Benchmarking Complex Mathematical Reasoning of Multimodal Large Language Models Via Error Detection
Yibo Yan
Shen Wang
Jiahao Huo
Hang Li
B. Li
...
Kun Wang
Hui Xiong
Philip S. Yu
Xuming Hu
Qingsong Wen
LRM
33
13
0
06 Oct 2024
SciDFM: A Large Language Model with Mixture-of-Experts for Science
SciDFM: A Large Language Model with Mixture-of-Experts for Science
Liangtai Sun
Danyu Luo
Da Ma
Zihan Zhao
Baocai Chen
Zhennan Shen
Su Zhu
Lu Chen
Xin Chen
Kai Yu
MoE
35
2
0
27 Sep 2024
ChemEval: A Comprehensive Multi-Level Chemical Evaluation for Large
  Language Models
ChemEval: A Comprehensive Multi-Level Chemical Evaluation for Large Language Models
Yuqing Huang
Rongyang Zhang
X. He
Xuyang Zhi
Hao Wang
...
Guoping Hu
Guiquan Liu
Qi Liu
Defu Lian
Enhong Chen
ELM
29
4
0
21 Sep 2024
IW-Bench: Evaluating Large Multimodal Models for Converting Image-to-Web
IW-Bench: Evaluating Large Multimodal Models for Converting Image-to-Web
Hongcheng Guo
Wei Zhang
Junhao Chen
Yaonan Gu
Jian Yang
...
Binyuan Hui
Tianyu Liu
Jianxin Ma
Chang Zhou
Zhoujun Li
25
1
0
14 Sep 2024
Generative Hierarchical Materials Search
Generative Hierarchical Materials Search
Sherry Yang
Simon L. Batzner
Ruiqi Gao
Muratahan Aykol
Alexander L. Gaunt
Brendan McMorrow
Danilo J. Rezende
Dale Schuurmans
Igor Mordatch
E. D. Cubuk
AI4CE
40
6
0
10 Sep 2024
Watermarking Techniques for Large Language Models: A Survey
Watermarking Techniques for Large Language Models: A Survey
Yuqing Liang
Jiancheng Xiao
Wensheng Gan
Philip S. Yu
OffRL
29
3
0
26 Aug 2024
Towards Effective and Efficient Continual Pre-training of Large Language
  Models
Towards Effective and Efficient Continual Pre-training of Large Language Models
Jie Chen
Zhipeng Chen
Jiapeng Wang
Kun Zhou
Yutao Zhu
...
Rui Yan
Zhewei Wei
Di Hu
Wenbing Huang
Ji-Rong Wen
KELM
ALM
CLL
ELM
LRM
74
4
0
26 Jul 2024
NTSEBENCH: Cognitive Reasoning Benchmark for Vision Language Models
NTSEBENCH: Cognitive Reasoning Benchmark for Vision Language Models
Pranshu Pandya
Agney S Talwarr
Vatsal Gupta
Tushar Kataria
Dan Roth
Vivek Gupta
LRM
61
2
0
15 Jul 2024
OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI
OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI
Zhen Huang
Zengzhi Wang
Shijie Xia
Xuefeng Li
Haoyang Zou
...
Yuxiang Zheng
Shaoting Zhang
Dahua Lin
Yu Qiao
Pengfei Liu
ELM
LRM
49
26
0
18 Jun 2024
Recent Advances in Federated Learning Driven Large Language Models: A Survey on Architecture, Performance, and Security
Recent Advances in Federated Learning Driven Large Language Models: A Survey on Architecture, Performance, and Security
Youyang Qu
Ming Liu
Tianqing Zhu
Longxiang Gao
Shui Yu
Wanlei Zhou
MU
FedML
57
2
0
14 Jun 2024
SciKnowEval: Evaluating Multi-level Scientific Knowledge of Large
  Language Models
SciKnowEval: Evaluating Multi-level Scientific Knowledge of Large Language Models
Kehua Feng
Keyan Ding
Weijie Wang
Xiang Zhuang
Zeyuan Wang
Ming Qin
Yu Zhao
Jianhua Yao
Qiang Zhang
H. Chen
ELM
45
6
0
13 Jun 2024
SciRIFF: A Resource to Enhance Language Model Instruction-Following over
  Scientific Literature
SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literature
David Wadden
Kejian Shi
Jacob Morrison
Aakanksha Naik
Shruti Singh
...
Luca Soldaini
Shannon Zejiang Shen
Doug Downey
Hannaneh Hajishirzi
Arman Cohan
47
11
0
10 Jun 2024
Is On-Device AI Broken and Exploitable? Assessing the Trust and Ethics in Small Language Models
Is On-Device AI Broken and Exploitable? Assessing the Trust and Ethics in Small Language Models
Kalyan Nakka
Jimmy Dani
Nitesh Saxena
43
1
0
08 Jun 2024
MAmmoTH2: Scaling Instructions from the Web
MAmmoTH2: Scaling Instructions from the Web
Xiang Yue
Tuney Zheng
Ge Zhang
Wenhu Chen
ALM
LRM
46
85
0
06 May 2024
Are large language models superhuman chemists?
Are large language models superhuman chemists?
Adrian Mirza
Nawaf Alampara
Sreekanth Kunchapu
Benedict Emoekabu
Aswanth Krishnan
...
Leanne M. Stafast
Dinga Wonanke
Michael Pieler
P. Schwaller
K. Jablonka
ELM
AI4MH
LRM
LM&MA
31
4
0
01 Apr 2024
SciAssess: Benchmarking LLM Proficiency in Scientific Literature
  Analysis
SciAssess: Benchmarking LLM Proficiency in Scientific Literature Analysis
Hengxing Cai
Xiaochen Cai
Junhan Chang
Sihang Li
Lin Yao
...
Changhong Chen
Zheng Cheng
Zifeng Zhao
Linfeng Zhang
Guolin Ke
ELM
34
24
0
04 Mar 2024
Leveraging Biomolecule and Natural Language through Multi-Modal
  Learning: A Survey
Leveraging Biomolecule and Natural Language through Multi-Modal Learning: A Survey
Qizhi Pei
Lijun Wu
Kaiyuan Gao
Jinhua Zhu
Yue Wang
Zun Wang
Tao Qin
Rui Yan
AI4CE
49
19
0
03 Mar 2024
A Multimodal Foundation Agent for Financial Trading: Tool-Augmented,
  Diversified, and Generalist
A Multimodal Foundation Agent for Financial Trading: Tool-Augmented, Diversified, and Generalist
Wentao Zhang
Lingxuan Zhao
Haochong Xia
Shuo Sun
Jiaze Sun
...
Yilei Zhao
Xinyu Cai
Longtao Zheng
Xinrun Wang
Bo An
AIFin
38
33
0
28 Feb 2024
OlympiadBench: A Challenging Benchmark for Promoting AGI with
  Olympiad-Level Bilingual Multimodal Scientific Problems
OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems
Chaoqun He
Renjie Luo
Yuzhuo Bai
Shengding Hu
Zhen Leng Thai
...
Yuxiang Zhang
Jie Liu
Lei Qi
Zhiyuan Liu
Maosong Sun
ELM
AIMat
33
137
0
21 Feb 2024
SciAgent: Tool-augmented Language Models for Scientific Reasoning
SciAgent: Tool-augmented Language Models for Scientific Reasoning
Yubo Ma
Zhibin Gou
Junheng Hao
Ruochen Xu
Shuohang Wang
...
Yujiu Yang
Yixin Cao
Aixin Sun
Hany Awadalla
Weizhu Chen
RALM
LRM
LLMAG
45
21
0
18 Feb 2024
AcademicGPT: Empowering Academic Research
AcademicGPT: Empowering Academic Research
Shufa Wei
Xiaolong Xu
Xianbiao Qi
Xi Yin
Jun Xia
...
Chihao Dai
Lihua Wang
Xiaohui Liu
Lei Zhang
Yutao Xie
LM&MA
39
3
0
21 Nov 2023
PsyEval: A Suite of Mental Health Related Tasks for Evaluating Large
  Language Models
PsyEval: A Suite of Mental Health Related Tasks for Evaluating Large Language Models
Haoan Jin
Siyuan Chen
Dilawaier Dilixiati
Yewei Jiang
Mengyue Wu
Ke Zhu
ELM
AI4MH
LM&MA
43
4
0
15 Nov 2023
MathVista: Evaluating Mathematical Reasoning of Foundation Models in
  Visual Contexts
MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts
Pan Lu
Hritik Bansal
Tony Xia
Jiacheng Liu
Chun-yue Li
Hannaneh Hajishirzi
Hao Cheng
Kai-Wei Chang
Michel Galley
Jianfeng Gao
LRM
MLLM
41
496
0
03 Oct 2023
ChatGPT-4 with Code Interpreter can be used to solve introductory
  college-level vector calculus and electromagnetism problems
ChatGPT-4 with Code Interpreter can be used to solve introductory college-level vector calculus and electromagnetism problems
Tanuj Kumar
M. Kats
16
9
0
16 Sep 2023
SciBench: Evaluating College-Level Scientific Problem-Solving Abilities
  of Large Language Models
SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models
Xiaoxuan Wang
Ziniu Hu
Pan Lu
Yanqiao Zhu
Jieyu Zhang
Satyen Subramaniam
Arjun R. Loomba
Shichang Zhang
Yizhou Sun
Wei Wang
ELM
LRM
23
85
0
20 Jul 2023
Chameleon: Plug-and-Play Compositional Reasoning with Large Language
  Models
Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models
Pan Lu
Baolin Peng
Hao Cheng
Michel Galley
Kai-Wei Chang
Ying Nian Wu
Song-Chun Zhu
Jianfeng Gao
KELM
MLLM
LRM
42
301
0
19 Apr 2023
Learn to Explain: Multimodal Reasoning via Thought Chains for Science
  Question Answering
Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
Pan Lu
Swaroop Mishra
Tony Xia
Liang Qiu
Kai-Wei Chang
Song-Chun Zhu
Oyvind Tafjord
Peter Clark
A. Kalyan
ELM
ReLM
LRM
211
1,106
0
20 Sep 2022
Large Language Models are Zero-Shot Reasoners
Large Language Models are Zero-Shot Reasoners
Takeshi Kojima
S. Gu
Machel Reid
Yutaka Matsuo
Yusuke Iwasawa
ReLM
LRM
316
4,097
0
24 May 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
313
11,915
0
04 Mar 2022
PubMedQA: A Dataset for Biomedical Research Question Answering
PubMedQA: A Dataset for Biomedical Research Question Answering
Qiao Jin
Bhuwan Dhingra
Zhengping Liu
William W. Cohen
Xinghua Lu
210
812
0
13 Sep 2019
1