ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.13503
  4. Cited By
SciHorizon: Benchmarking AI-for-Science Readiness from Scientific Data to Large Language Models

SciHorizon: Benchmarking AI-for-Science Readiness from Scientific Data to Large Language Models

12 March 2025
Chuan Qin
Xiusi Chen
Chengrui Wang
Pengmin Wu
Xi Chen
Yihang Cheng
Jingyi Zhao
Meng Xiao
Xiangchao Dong
Qingqing Long
Boya Pan
Han Wu
Chong Li
Yuanchun Zhou
H. Xiong
Hengshu Zhu
    ELM
ArXivPDFHTML

Papers citing "SciHorizon: Benchmarking AI-for-Science Readiness from Scientific Data to Large Language Models"

40 / 40 papers shown
Title
FastFT: Accelerating Reinforced Feature Transformation via Advanced Exploration Strategies
FastFT: Accelerating Reinforced Feature Transformation via Advanced Exploration Strategies
Tianqi He
Xiaohan Huang
Yi Du
Qingqing Long
Ziyue Qiao
Min-Ying Wu
Yanjie Fu
Yuanchun Zhou
Meng Xiao
OffRL
129
2
0
26 Mar 2025
AI Data Readiness Inspector (AIDRIN) for Quantitative Assessment of Data Readiness for AI
AI Data Readiness Inspector (AIDRIN) for Quantitative Assessment of Data Readiness for AI
Kaveen Hiniduma
Suren Byna
J. L. Bez
Ravi Madduri
64
7
0
27 Jun 2024
MathBench: Evaluating the Theory and Application Proficiency of LLMs
  with a Hierarchical Mathematics Benchmark
MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark
Hongwei Liu
Zilong Zheng
Yuxuan Qiao
Haodong Duan
Zhiwei Fei
Fengzhe Zhou
Wenwei Zhang
Songyang Zhang
Dahua Lin
Kai-xiang Chen
75
62
0
20 May 2024
Enhancing Question Answering for Enterprise Knowledge Bases using Large
  Language Models
Enhancing Question Answering for Enterprise Knowledge Bases using Large Language Models
Feihu Jiang
Chuan Qin
Kaichun Yao
Chuyu Fang
Fuzhen Zhuang
Hengshu Zhu
Hui Xiong
57
5
0
10 Apr 2024
Data Readiness for AI: A 360-Degree Survey
Data Readiness for AI: A 360-Degree Survey
Kaveen Hiniduma
Suren Byna
J. L. Bez
54
8
0
08 Apr 2024
Hypothesis Generation with Large Language Models
Hypothesis Generation with Large Language Models
Yangqiaoyu Zhou
Haokun Liu
Tejes Srivastava
Hongyuan Mei
Chenhao Tan
LRM
46
36
0
05 Apr 2024
Inference to the Best Explanation in Large Language Models
Inference to the Best Explanation in Large Language Models
Dhairya Dalal
Marco Valentino
André Freitas
Paul Buitelaar
LRM
ELM
61
3
0
16 Feb 2024
GPQA: A Graduate-Level Google-Proof Q&A Benchmark
GPQA: A Graduate-Level Google-Proof Q&A Benchmark
David Rein
Betty Li Hou
Asa Cooper Stickland
Jackson Petty
Richard Yuanzhe Pang
Julien Dirani
Julian Michael
Samuel R. Bowman
AI4MH
ELM
64
584
0
20 Nov 2023
Beyond Factuality: A Comprehensive Evaluation of Large Language Models
  as Knowledge Generators
Beyond Factuality: A Comprehensive Evaluation of Large Language Models as Knowledge Generators
Liang Chen
Yang Deng
Yatao Bian
Zeyu Qin
Bingzhe Wu
Tat-Seng Chua
Kam-Fai Wong
HILM
ELM
77
46
0
11 Oct 2023
MathVista: Evaluating Mathematical Reasoning of Foundation Models in
  Visual Contexts
MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts
Pan Lu
Hritik Bansal
Tony Xia
Jiacheng Liu
Chun-yue Li
Hannaneh Hajishirzi
Hao Cheng
Kai-Wei Chang
Michel Galley
Jianfeng Gao
LRM
MLLM
67
541
0
03 Oct 2023
FELM: Benchmarking Factuality Evaluation of Large Language Models
FELM: Benchmarking Factuality Evaluation of Large Language Models
Shiqi Chen
Yiran Zhao
Jinghan Zhang
Ethan Chern
Siyang Gao
Pengfei Liu
Junxian He
HILM
56
36
0
01 Oct 2023
SciEval: A Multi-Level Large Language Model Evaluation Benchmark for
  Scientific Research
SciEval: A Multi-Level Large Language Model Evaluation Benchmark for Scientific Research
Liangtai Sun
Yang Han
Zihan Zhao
Da Ma
Zhe-Wei Shen
Baocai Chen
Lu Chen
Kai Yu
ELM
47
80
0
25 Aug 2023
CMB: A Comprehensive Medical Benchmark in Chinese
CMB: A Comprehensive Medical Benchmark in Chinese
Xidong Wang
Guiming Hardy Chen
Dingjie Song
Zhiyi Zhang
Zhihong Chen
...
Feng Jiang
Jianquan Li
Xiang Wan
Benyou Wang
Haizhou Li
LM&MA
ELM
AI4MH
56
80
0
17 Aug 2023
SciBench: Evaluating College-Level Scientific Problem-Solving Abilities
  of Large Language Models
SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models
Xiaoxuan Wang
Ziniu Hu
Pan Lu
Yanqiao Zhu
Jieyu Zhang
Satyen Subramaniam
Arjun R. Loomba
Shichang Zhang
Yizhou Sun
Wei Wang
ELM
LRM
30
98
0
20 Jul 2023
A Survey on Evaluation of Large Language Models
A Survey on Evaluation of Large Language Models
Yu-Chu Chang
Xu Wang
Jindong Wang
Yuanyi Wu
Linyi Yang
...
Yue Zhang
Yi-Ju Chang
Philip S. Yu
Qian Yang
Xingxu Xie
ELM
LM&MA
ALM
91
1,592
0
06 Jul 2023
CMMLU: Measuring massive multitask language understanding in Chinese
CMMLU: Measuring massive multitask language understanding in Chinese
Haonan Li
Yixuan Zhang
Fajri Koto
Yifei Yang
Hai Zhao
Yeyun Gong
Nan Duan
Tim Baldwin
ALM
ELM
60
253
0
15 Jun 2023
Xiezhi: An Ever-Updating Benchmark for Holistic Domain Knowledge
  Evaluation
Xiezhi: An Ever-Updating Benchmark for Holistic Domain Knowledge Evaluation
Zhouhong Gu
Xiaoxuan Zhu
Haoning Ye
Lin Zhang
Jianchen Wang
...
Zili Wang
Shusen Wang
Weiguo Zheng
Hongwei Feng
Yanghua Xiao
ALM
ELM
111
60
0
09 Jun 2023
GEO-Bench: Toward Foundation Models for Earth Monitoring
GEO-Bench: Toward Foundation Models for Earth Monitoring
Alexandre Lacoste
Nils Lehmann
Pau Rodríguez López
Evan D. Sherwin
Hannah Kerner
...
David Vazquez
Dava Newman
Yoshua Bengio
Stefano Ermon
Xiao Xiang Zhu
SSL
ALM
AI4CE
40
60
0
06 Jun 2023
A Survey on Large Language Models for Recommendation
A Survey on Large Language Models for Recommendation
Likang Wu
Zhilan Zheng
Zhaopeng Qiu
Hao Wang
Hongchao Gu
...
Chen Zhu
Hengshu Zhu
Qi Liu
Hui Xiong
Enhong Chen
106
379
0
31 May 2023
What can Large Language Models do in chemistry? A comprehensive
  benchmark on eight tasks
What can Large Language Models do in chemistry? A comprehensive benchmark on eight tasks
Taicheng Guo
Kehan Guo
B. Nan
Zhengwen Liang
Zhichun Guo
Nitesh Chawla
Olaf Wiest
Xiangliang Zhang
ELM
93
136
0
27 May 2023
Have LLMs Advanced Enough? A Challenging Problem Solving Benchmark For
  Large Language Models
Have LLMs Advanced Enough? A Challenging Problem Solving Benchmark For Large Language Models
Daman Arora
H. Singh
Mausam
ELM
LRM
64
52
0
24 May 2023
TheoremQA: A Theorem-driven Question Answering dataset
TheoremQA: A Theorem-driven Question Answering dataset
Wenhu Chen
Ming Yin
Max Ku
Pan Lu
Yixin Wan
Xueguang Ma
Jianyu Xu
Xinyi Wang
Tony Xia
AIMat
54
129
0
21 May 2023
C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for
  Foundation Models
C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models
Yuzhen Huang
Yuzhuo Bai
Zhihao Zhu
Junlei Zhang
Jinghan Zhang
...
Yikai Zhang
Jiayi Lei
Yao Fu
Maosong Sun
Junxian He
ELM
LRM
45
519
0
15 May 2023
AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models
AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models
Wanjun Zhong
Ruixiang Cui
Yiduo Guo
Yaobo Liang
Shuai Lu
Yanlin Wang
Amin Saied
Weizhu Chen
Nan Duan
ALM
ELM
48
516
0
13 Apr 2023
Can ChatGPT be used to generate scientific hypotheses?
Can ChatGPT be used to generate scientific hypotheses?
Yang Jeong Park
Daniel Kaplan
Zhichu Ren
Chia-Wei Hsu
Changhao Li
Haowei Xu
Sipei Li
Ju Li
LRM
36
40
0
30 Mar 2023
LLaMA: Open and Efficient Foundation Language Models
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron
Thibaut Lavril
Gautier Izacard
Xavier Martinet
Marie-Anne Lachaux
...
Faisal Azhar
Aurelien Rodriguez
Armand Joulin
Edouard Grave
Guillaume Lample
ALM
PILM
593
12,840
0
27 Feb 2023
Mathematical Capabilities of ChatGPT
Mathematical Capabilities of ChatGPT
Simon Frieder
Luca Pinchetti
Alexis Chevalier
Ryan-Rhys Griffiths
Tommaso Salvatori
Thomas Lukasiewicz
P. Petersen
Julius Berner
ELM
AI4MH
94
412
0
31 Jan 2023
Large Language Models Encode Clinical Knowledge
Large Language Models Encode Clinical Knowledge
K. Singhal
Shekoofeh Azizi
T. Tu
S. S. Mahdavi
Jason W. Wei
...
A. Rajkomar
Joelle Barral
Christopher Semturs
Alan Karthikesalingam
Vivek Natarajan
LM&MA
ELM
AI4MH
99
2,258
0
26 Dec 2022
FAIR for AI: An interdisciplinary and international community building
  perspective
FAIR for AI: An interdisciplinary and international community building perspective
Eliu A. Huerta
Ben Blaiszik
L. C. Brinson
Kristofer E Bouchard
Daniel Madrigal Diaz
...
Fotis Psomopoulos
Avik Roy
Oliver Rübel
Zhizhen Zhao
Ruike Zhu
45
42
0
30 Sep 2022
Learn to Explain: Multimodal Reasoning via Thought Chains for Science
  Question Answering
Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
Pan Lu
Swaroop Mishra
Tony Xia
Liang Qiu
Kai-Wei Chang
Song-Chun Zhu
Oyvind Tafjord
Peter Clark
Ashwin Kalyan
ELM
ReLM
LRM
226
1,188
0
20 Sep 2022
The Privacy Onion Effect: Memorization is Relative
The Privacy Onion Effect: Memorization is Relative
Nicholas Carlini
Matthew Jagielski
Chiyuan Zhang
Nicolas Papernot
Andreas Terzis
Florian Tramèr
PILM
MIACV
110
104
0
21 Jun 2022
Beyond the Imitation Game: Quantifying and extrapolating the
  capabilities of language models
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
Aarohi Srivastava
Abhinav Rastogi
Abhishek Rao
Abu Awal Md Shoeb
Abubakar Abid
...
Zhuoye Zhao
Zijian Wang
Zijie J. Wang
Zirui Wang
Ziyi Wu
ELM
69
1,726
0
09 Jun 2022
MedMCQA : A Large-scale Multi-Subject Multi-Choice Dataset for Medical
  domain Question Answering
MedMCQA : A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering
Ankit Pal
Logesh Kumar Umapathi
Malaikannan Sankarasubbu
ELM
LM&MA
48
321
0
27 Mar 2022
MedMNIST v2 -- A large-scale lightweight benchmark for 2D and 3D
  biomedical image classification
MedMNIST v2 -- A large-scale lightweight benchmark for 2D and 3D biomedical image classification
Jiancheng Yang
Rui Shi
D. Wei
Zequan Liu
Lin Zhao
B. Ke
Hanspeter Pfister
Bingbing Ni
VLM
241
675
0
27 Oct 2021
Training Verifiers to Solve Math Word Problems
Training Verifiers to Solve Math Word Problems
K. Cobbe
V. Kosaraju
Mohammad Bavarian
Mark Chen
Heewoo Jun
...
Jerry Tworek
Jacob Hilton
Reiichiro Nakano
Christopher Hesse
John Schulman
ReLM
OffRL
LRM
183
4,175
0
27 Oct 2021
What Disease does this Patient Have? A Large-scale Open Domain Question
  Answering Dataset from Medical Exams
What Disease does this Patient Have? A Large-scale Open Domain Question Answering Dataset from Medical Exams
Di Jin
Eileen Pan
Nassim Oufattole
W. Weng
Hanyi Fang
Peter Szolovits
FaML
ELM
LM&MA
72
749
0
28 Sep 2020
Measuring Massive Multitask Language Understanding
Measuring Massive Multitask Language Understanding
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
D. Song
Jacob Steinhardt
ELM
RALM
132
4,222
0
07 Sep 2020
MODMA dataset: a Multi-modal Open Dataset for Mental-disorder Analysis
MODMA dataset: a Multi-modal Open Dataset for Mental-disorder Analysis
Hanshu Cai
Y. Gao
Shuting Sun
Na Li
Fuze Tian
...
Jing Yang
Lan Zhang
Xiping Hu
Yumin Li
Bin Hu
28
153
0
20 Feb 2020
MathQA: Towards Interpretable Math Word Problem Solving with
  Operation-Based Formalisms
MathQA: Towards Interpretable Math Word Problem Solving with Operation-Based Formalisms
Aida Amini
Saadia Gabriel
Shanchuan Lin
Rik Koncel-Kedziorski
Yejin Choi
Hannaneh Hajishirzi
AIMat
ReLM
AI4CE
90
553
0
30 May 2019
Program Induction by Rationale Generation : Learning to Solve and
  Explain Algebraic Word Problems
Program Induction by Rationale Generation : Learning to Solve and Explain Algebraic Word Problems
Wang Ling
Dani Yogatama
Chris Dyer
Phil Blunsom
AIMat
49
701
0
11 May 2017
1