ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2308.11462
  4. Cited By
LegalBench: A Collaboratively Built Benchmark for Measuring Legal
  Reasoning in Large Language Models

LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models

20 August 2023
Neel Guha
Julian Nyarko
Daniel E. Ho
Christopher Ré
Adam Chilton
Aditya Narayana
Alex Chohlas-Wood
Austin M. K. Peters
Brandon Waldon
D. Rockmore
Diego A. Zambrano
Dmitry Talisman
E. Hoque
Faiz Surani
F. Fagan
Galit Sarfaty
Gregory M. Dickinson
Haggai Porat
Jason Hegland
Jessica Wu
Joe Nudell
Joel Niklaus
John J. Nay
Jonathan H. Choi
K. Tobia
M. Hagan
Megan Ma
Michael A. Livermore
Nikon Rasumov-Rahe
Nils Holzenberger
Noam Kolt
Peter Henderson
Sean Rehaag
Sharad Goel
Shangsheng Gao
Spencer Williams
Sunny G. Gandhi
Tomer Zur
Varun J. Iyer
Zehua Li
    AILaw
    LRM
    ELM
ArXivPDFHTML

Papers citing "LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models"

26 / 26 papers shown
Title
LEXam: Benchmarking Legal Reasoning on 340 Law Exams
LEXam: Benchmarking Legal Reasoning on 340 Law Exams
Yu Fan
Jingwei Ni
Jakob Merane
Etienne Salimbeni
Yang Tian
...
Mrinmaya Sachan
Alexander Stremitzer
Christoph Engel
Elliott Ash
Joel Niklaus
AILaw
ELM
18
0
0
19 May 2025
WorldView-Bench: A Benchmark for Evaluating Global Cultural Perspectives in Large Language Models
WorldView-Bench: A Benchmark for Evaluating Global Cultural Perspectives in Large Language Models
Abdullah Mushtaq
Imran Taj
Rafay Naeem
Ibrahim Ghaznavi
Junaid Qadir
26
0
0
14 May 2025
Incorporating Legal Structure in Retrieval-Augmented Generation: A Case Study on Copyright Fair Use
Incorporating Legal Structure in Retrieval-Augmented Generation: A Case Study on Copyright Fair Use
Justin Ho
Alexandra Colby
William Fisher
AILaw
44
0
0
04 May 2025
LawFlow : Collecting and Simulating Lawyers' Thought Processes
LawFlow : Collecting and Simulating Lawyers' Thought Processes
Debarati Das
Khanh Chi Le
R. Parkar
Karin de Langis
Brendan Madson
...
Robin M. Willis
Daniel H. Moses
Brett McDonnell
Daniel Schwarcz
Dongyeop Kang
AILaw
173
0
0
26 Apr 2025
DMind Benchmark: Toward a Holistic Assessment of LLM Capabilities across the Web3 Domain
DMind Benchmark: Toward a Holistic Assessment of LLM Capabilities across the Web3 Domain
Miracle Master
Rainy Sun
Anya Reese
Joey Ouyang
Alex Chen
...
James Yi
Garry Zhao
Tony Ling
Hobert Wong
Lowes Yang
ALM
ELM
77
0
0
18 Apr 2025
CPG-EVAL: A Multi-Tiered Benchmark for Evaluating the Chinese Pedagogical Grammar Competence of Large Language Models
CPG-EVAL: A Multi-Tiered Benchmark for Evaluating the Chinese Pedagogical Grammar Competence of Large Language Models
Dong Wang
ELM
33
0
0
17 Apr 2025
Are Language Models Up to Sequential Optimization Problems? From Evaluation to a Hegelian-Inspired Enhancement
Are Language Models Up to Sequential Optimization Problems? From Evaluation to a Hegelian-Inspired Enhancement
Soheil Abbasloo
LRM
44
0
0
04 Feb 2025
LicenseGPT: A Fine-tuned Foundation Model for Publicly Available Dataset License Compliance
Jingwen Tan
Gopi Krishnan Rajbahadur
Zi Li
Xiangfu Song
Jianshan Lin
Dan Li
Zibin Zheng
Ahmed E. Hassan
54
1
0
03 Jan 2025
A Picture is Worth A Thousand Numbers: Enabling LLMs Reason about Time Series via Visualization
A Picture is Worth A Thousand Numbers: Enabling LLMs Reason about Time Series via Visualization
Haoxin Liu
Chenghao Liu
B. Prakash
AI4TS
LRM
93
7
0
09 Nov 2024
In Context Learning and Reasoning for Symbolic Regression with Large Language Models
In Context Learning and Reasoning for Symbolic Regression with Large Language Models
Samiha Sharlin
Tyler R. Josephson
ReLM
LLMAG
LRM
47
1
0
22 Oct 2024
Weak-to-Strong Generalization beyond Accuracy: a Pilot Study in Safety, Toxicity, and Legal Reasoning
Weak-to-Strong Generalization beyond Accuracy: a Pilot Study in Safety, Toxicity, and Legal Reasoning
Ruimeng Ye
Yang Xiao
Bo Hui
ALM
ELM
OffRL
29
2
0
16 Oct 2024
Enterprise Benchmarks for Large Language Model Evaluation
Enterprise Benchmarks for Large Language Model Evaluation
Bing Zhang
Mikio Takeuchi
Ryo Kawahara
Shubhi Asthana
Md. Maruf Hossain
Guang-Jie Ren
Kate Soule
Yada Zhu
ELM
42
2
0
11 Oct 2024
FAMMA: A Benchmark for Financial Domain Multilingual Multimodal Question Answering
FAMMA: A Benchmark for Financial Domain Multilingual Multimodal Question Answering
Siqiao Xue
Tingting Chen
Fan Zhou
Qingyang Dai
Zhixuan Chu
Hongyuan Mei
41
4
0
06 Oct 2024
Evaluating the Correctness of Inference Patterns Used by LLMs for Judgment
Evaluating the Correctness of Inference Patterns Used by LLMs for Judgment
Lu Chen
Yuxuan Huang
Yixing Li
Dongrui Liu
Qihan Ren
Shuai Zhao
Kun Kuang
Zilong Zheng
Quanshi Zhang
31
1
0
06 Oct 2024
Retrieval Augmented Generation (RAG) and Beyond: A Comprehensive Survey
  on How to Make your LLMs use External Data More Wisely
Retrieval Augmented Generation (RAG) and Beyond: A Comprehensive Survey on How to Make your LLMs use External Data More Wisely
Siyun Zhao
Yuqing Yang
Zilong Wang
Zhiyuan He
Luna Qiu
Lili Qiu
SyDa
RALM
3DV
44
35
0
23 Sep 2024
Generated Data with Fake Privacy: Hidden Dangers of Fine-tuning Large Language Models on Generated Data
Generated Data with Fake Privacy: Hidden Dangers of Fine-tuning Large Language Models on Generated Data
Atilla Akkus
Mingjie Li
Junjie Chu
Junjie Chu
Michael Backes
Sinem Sav
Sinem Sav
SILM
SyDa
48
1
0
12 Sep 2024
HyPA-RAG: A Hybrid Parameter Adaptive Retrieval-Augmented Generation System for AI Legal and Policy Applications
HyPA-RAG: A Hybrid Parameter Adaptive Retrieval-Augmented Generation System for AI Legal and Policy Applications
Rishi Kalra
Zekun Wu
Ayesha Gulley
Airlie Hilliard
Xin Guan
Adriano Soares Koshiyama
Philip C. Treleaven
RALM
AILaw
54
5
0
29 Aug 2024
ArabLegalEval: A Multitask Benchmark for Assessing Arabic Legal
  Knowledge in Large Language Models
ArabLegalEval: A Multitask Benchmark for Assessing Arabic Legal Knowledge in Large Language Models
Faris Hijazi
Somayah Alharbi
Abdulaziz AlHussein
Harethah Shairah
Reem Alzahrani
Hebah Alshamlan
Omar Knio
G. Turkiyyah
AILaw
ELM
52
2
0
15 Aug 2024
What is the best model? Application-driven Evaluation for Large Language
  Models
What is the best model? Application-driven Evaluation for Large Language Models
Shiguo Lian
Kaikai Zhao
Xinhui Liu
Xuejiao Lei
Bikun Yang
Wenjing Zhang
Kai Wang
Zhaoxiang Liu
ALM
ELM
40
2
0
14 Jun 2024
CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery
CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery
Xiaoshuai Song
Muxi Diao
Guanting Dong
Zhengyang Wang
Yujia Fu
...
Yejie Wang
Zhuoma Gongque
Jianing Yu
Qiuna Tan
Weiran Xu
ELM
55
11
0
12 Jun 2024
The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
Seungone Kim
Juyoung Suk
Ji Yong Cho
Shayne Longpre
Chaeeun Kim
...
Sean Welleck
Graham Neubig
Moontae Lee
Kyungjae Lee
Minjoon Seo
ELM
ALM
LM&MA
105
31
0
09 Jun 2024
Is On-Device AI Broken and Exploitable? Assessing the Trust and Ethics in Small Language Models
Is On-Device AI Broken and Exploitable? Assessing the Trust and Ethics in Small Language Models
Kalyan Nakka
Jimmy Dani
Nitesh Saxena
48
1
0
08 Jun 2024
Hallucination-Free? Assessing the Reliability of Leading AI Legal
  Research Tools
Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools
Varun Magesh
Faiz Surani
Matthew Dahl
Mirac Suzgun
Christopher D. Manning
Daniel E. Ho
HILM
ELM
AILaw
27
66
0
30 May 2024
How Far Are We on the Decision-Making of LLMs? Evaluating LLMs' Gaming Ability in Multi-Agent Environments
How Far Are We on the Decision-Making of LLMs? Evaluating LLMs' Gaming Ability in Multi-Agent Environments
Jen-tse Huang
E. Li
Man Ho Lam
Tian Liang
Wenxuan Wang
Youliang Yuan
Wenxiang Jiao
Xing Wang
Zhaopeng Tu
Michael R. Lyu
ELM
LLMAG
88
33
0
18 Mar 2024
LawBench: Benchmarking Legal Knowledge of Large Language Models
LawBench: Benchmarking Legal Knowledge of Large Language Models
Zhiwei Fei
Xiaoyu Shen
D. Zhu
Fengzhe Zhou
Zhuo Han
Songyang Zhang
Kai-xiang Chen
Zongwen Shen
Jidong Ge
ELM
AILaw
34
34
0
28 Sep 2023
LexGLUE: A Benchmark Dataset for Legal Language Understanding in English
LexGLUE: A Benchmark Dataset for Legal Language Understanding in English
Ilias Chalkidis
Abhik Jana
D. Hartung
M. Bommarito
Ion Androutsopoulos
Daniel Martin Katz
Nikolaos Aletras
AILaw
ELM
130
249
0
03 Oct 2021
1