Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2207.00220
Cited By
Pile of Law: Learning Responsible Data Filtering from the Law and a 256GB Open-Source Legal Dataset
1 July 2022
Peter Henderson
M. Krass
Lucia Zheng
Neel Guha
Christopher D. Manning
Dan Jurafsky
Daniel E. Ho
AILaw
ELM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Pile of Law: Learning Responsible Data Filtering from the Law and a 256GB Open-Source Legal Dataset"
25 / 25 papers shown
Title
Learning Dynamics in Continual Pre-Training for Large Language Models
Xingjin Wang
Howe Tissue
Lu Wang
Linjing Li
D. Zeng
CLL
34
0
0
12 May 2025
A Perplexity and Menger Curvature-Based Approach for Similarity Evaluation of Large Language Models
Yuantao Zhang
Zhankui Yang
AAML
38
0
0
05 Apr 2025
LRAGE: Legal Retrieval Augmented Generation Evaluation Tool
Minhu Park
Hongseok Oh
Eunkyung Choi
Wonseok Hwang
AILaw
RALM
ELM
115
0
0
02 Apr 2025
Privacy Ripple Effects from Adding or Removing Personal Information in Language Model Training
Jaydeep Borkar
Matthew Jagielski
Katherine Lee
Niloofar Mireshghallah
David A. Smith
Christopher A. Choquette-Choo
PILM
85
1
0
24 Feb 2025
Natural Language Processing for the Legal Domain: A Survey of Tasks, Datasets, Models, and Challenges
Farid Ariai
Gianluca Demartini
ELM
AILaw
VLM
45
4
0
25 Oct 2024
Retro-li: Small-Scale Retrieval Augmented Generation Supporting Noisy Similarity Searches and Domain Shift Generalization
Gentiana Rashiti
G. Karunaratne
Mrinmaya Sachan
Abu Sebastian
Abbas Rahimi
RALM
44
0
0
12 Sep 2024
Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools
Varun Magesh
Faiz Surani
Matthew Dahl
Mirac Suzgun
Christopher D. Manning
Daniel E. Ho
HILM
ELM
AILaw
29
66
0
30 May 2024
Nearest Neighbor Speculative Decoding for LLM Generation and Attribution
Minghan Li
Xilun Chen
Ari Holtzman
Beidi Chen
Jimmy Lin
Wen-tau Yih
Xi Lin
RALM
BDL
108
10
0
29 May 2024
Data Authenticity, Consent, & Provenance for AI are all broken: what will it take to fix them?
Shayne Longpre
Robert Mahari
Naana Obeng-Marnu
William Brannon
Tobin South
Katy Gero
Sandy Pentland
Jad Kabbara
66
5
0
19 Apr 2024
SaulLM-7B: A pioneering Large Language Model for Law
Pierre Colombo
T. Pires
Malik Boudiaf
Dominic Culver
Rui Melo
...
Andre F. T. Martins
Fabrizio Esposito
Vera Lúcia Raposo
Sofia Morgado
Michael Desa
ELM
AILaw
54
66
0
06 Mar 2024
Walking a Tightrope -- Evaluating Large Language Models in High-Risk Domains
Chia-Chien Hung
Wiem Ben-Rim
Lindsay Frost
Lars Bruckner
Carolin (Haas) Lawrence
AILaw
ALM
ELM
25
9
0
25 Nov 2023
SILO Language Models: Isolating Legal Risk In a Nonparametric Datastore
Sewon Min
Suchin Gururangan
Eric Wallace
Hannaneh Hajishirzi
Noah A. Smith
Luke Zettlemoyer
AILaw
28
63
0
08 Aug 2023
Training Data Extraction From Pre-trained Language Models: A Survey
Shotaro Ishihara
37
46
0
25 May 2023
Legal Extractive Summarization of U.S. Court Opinions
Emmanuel J. Bauer
Dominik Stammbach
Nianlong Gu
Elliott Ash
AILaw
ELM
39
8
0
15 May 2023
The MiniPile Challenge for Data-Efficient Language Models
Jean Kaddour
MoE
ALM
28
40
0
17 Apr 2023
Making a Computational Attorney
Dell Zhang
Frank Schilder
Jack G. Conrad
Masoud Makrehchi
David von Rickenbach
Isabelle Moulinier
27
1
0
07 Mar 2023
Incorporating Context into Subword Vocabularies
Shaked Yehezkel
Yuval Pinter
47
8
0
13 Oct 2022
Law Informs Code: A Legal Informatics Approach to Aligning Artificial Intelligence with Humans
John J. Nay
ELM
AILaw
88
27
0
14 Sep 2022
Pre-trained Language Models for the Legal Domain: A Case Study on Indian Law
Shounak Paul
A. Mandal
Pawan Goyal
Saptarshi Ghosh
AILaw
VLM
ELM
40
45
0
13 Sep 2022
Whose Language Counts as High Quality? Measuring Language Ideologies in Text Data Selection
Suchin Gururangan
Dallas Card
Sarah K. Drier
E. K. Gade
Leroy Z. Wang
Zeyu Wang
Luke Zettlemoyer
Noah A. Smith
175
74
0
25 Jan 2022
LexGLUE: A Benchmark Dataset for Legal Language Understanding in English
Ilias Chalkidis
Abhik Jana
D. Hartung
M. Bommarito
Ion Androutsopoulos
Daniel Martin Katz
Nikolaos Aletras
AILaw
ELM
130
250
0
03 Oct 2021
Towards generalisable hate speech detection: a review on obstacles and solutions
Wenjie Yin
A. Zubiaga
117
164
0
17 Feb 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
282
2,007
0
31 Dec 2020
Extracting Training Data from Large Language Models
Nicholas Carlini
Florian Tramèr
Eric Wallace
Matthew Jagielski
Ariel Herbert-Voss
...
Tom B. Brown
D. Song
Ulfar Erlingsson
Alina Oprea
Colin Raffel
MLAU
SILM
290
1,831
0
14 Dec 2020
A Benchmark for Lease Contract Review
Spyretta Leivaditi
Julien Rossi
Evangelos Kanoulas
AILaw
111
36
0
20 Oct 2020
1