Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2304.09433
Cited By
Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes
19 April 2023
Simran Arora
Brandon Yang
Sabri Eyuboglu
A. Narayan
Andrew Hojel
Immanuel Trummer
Christopher Ré
SyDa
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes"
50 / 61 papers shown
Title
ATLAS: Learning to Optimally Memorize the Context at Test Time
Ali Behrouz
Zeman Li
Praneeth Kacham
Majid Daliri
Yuan Deng
Peilin Zhong
Meisam Razaviyayn
Vahab Mirrokni
39
0
0
29 May 2025
SQUiD: Synthesizing Relational Databases from Unstructured Text
Mushtari Sadia
Zhenning Yang
Yunming Xiao
Ang Chen
Amrita Roy Chowdhury
SyDa
37
0
0
25 May 2025
How Does Sequence Modeling Architecture Influence Base Capabilities of Pre-trained Language Models? Exploring Key Architecture Design Principles to Avoid Base Capabilities Degradation
Xin Lu
Yanyan Zhao
Si Wei
Shijin Wang
Bing Qin
Ting Liu
30
0
0
24 May 2025
Efficient LLM Serving on Hybrid Real-time and Best-effort Requests
Wan Borui
Zhao Juntao
Jiang Chenyu
Guo Chuanxiong
Wu Chuan
VLM
114
1
0
13 Apr 2025
Simplifying Data Integration: SLM-Driven Systems for Unified Semantic Queries Across Heterogeneous Databases
Teng Lin
47
0
0
08 Apr 2025
LLM-Aided Customizable Profiling of Code Data Based On Programming Language Concepts
Pankaj Thorat
Adnan Qidwai
Adrija Dhar
Aishwariya Chakraborty
Anand Eswaran
Hima Patel
Praveen Jayachandran
71
0
0
19 Mar 2025
Minions: Cost-efficient Collaboration Between On-device and Cloud Language Models
A. Narayan
D. Biderman
Sabri Eyuboglu
Avner May
Scott W. Linderman
James Zou
Christopher Ré
80
2
0
21 Feb 2025
Graph-based Retrieval Augmented Generation for Dynamic Few-shot Text Classification
Yubo Wang
Haoyang Li
Fei Teng
Lei Chen
143
1
0
17 Feb 2025
CodeMonkeys: Scaling Test-Time Compute for Software Engineering
Ryan Ehrlich
Bradley Brown
Jordan Juravsky
Ronald Clark
Christopher Ré
Azalia Mirhoseini
77
10
0
24 Jan 2025
Mind the Data Gap: Bridging LLMs to Enterprise Data Integration
Moe Kayali
Fabian Wenz
Nesime Tatbul
Çağatay Demiralp
81
2
0
31 Dec 2024
The Design of an LLM-powered Unstructured Analytics System
Eric Anderson
Jonathan Fritz
Austin Lee
Bohou Li
Mark Lindblad
...
Mehul A. Shah
Benjamin Sowell
Dan Tecuci
Vinayak Thapliyal
Matt Welsh
76
12
0
31 Dec 2024
Smoothie: Label Free Language Model Routing
Neel Guha
Mayee F. Chen
Trevor Chow
Ishan S. Khare
Christopher Ré
106
4
0
06 Dec 2024
Unlocking State-Tracking in Linear RNNs Through Negative Eigenvalues
Riccardo Grazzi
Julien N. Siems
Jörg Franke
Arber Zela
Frank Hutter
Massimiliano Pontil
131
16
0
19 Nov 2024
DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing
Shreya Shankar
Tristan Chambers
Eugene Wu
Aditya G. Parameswaran
Eugene Wu
LLMAG
90
7
0
16 Oct 2024
Reward-Robust RLHF in LLMs
Yuzi Yan
Xingzhou Lou
Jialian Li
Yiping Zhang
Jian Xie
Chao Yu
Yu Wang
Dong Yan
Yuan Shen
73
13
0
18 Sep 2024
Large Language Models are Pattern Matchers: Editing Semi-Structured and Structured Documents with ChatGPT
Irene Weber
KELM
AI4MH
51
1
0
12 Sep 2024
Longhorn: State Space Models are Amortized Online Learners
Bo Liu
Rui Wang
Lemeng Wu
Yihao Feng
Peter Stone
Qian Liu
70
13
0
19 Jul 2024
A Declarative System for Optimizing AI Workloads
Chunwei Liu
Matthew Russo
Michael Cafarella
Lei Cao
Peter Baille Chen
Zui Chen
Michael Franklin
Tim Kraska
Samuel Madden
Gerardo Vitagliano
62
23
0
23 May 2024
Chameleon: Foundation Models for Fairness-aware Multi-modal Data Augmentation to Enhance Coverage of Minorities
Mahdi Erfanian
H. V. Jagadish
Abolfazl Asudeh
51
3
0
02 Feb 2024
Gated Linear Attention Transformers with Hardware-Efficient Training
Aaron Courville
Bailin Wang
Songlin Yang
Yikang Shen
Yoon Kim
69
161
0
11 Dec 2023
Jellyfish: A Large Language Model for Data Preprocessing
Haochen Zhang
Yuyang Dong
Chuan Xiao
Masafumi Oyamada
70
26
0
04 Dec 2023
SEED: Domain-Specific Data Curation With Large Language Models
Zui Chen
Lei Cao
Samuel Madden
Tim Kraska
Zeyuan Shang
Ju Fan
Nan Tang
Zihui Gu
Chunwei Liu
Michael Cafarella
55
7
0
01 Oct 2023
Generative Benchmark Creation for Table Union Search
Koyena Pal
Aamod Khatiwada
Roee Shraga
Renée J. Miller
58
0
0
07 Aug 2023
TPTU: Large Language Model-based AI Agents for Task Planning and Tool Usage
Jingqing Ruan
Yihong Chen
Bin Zhang
Zhiwei Xu
Tianpeng Bao
...
Shiwei Shi
Hangyu Mao
Ziyue Li
Xingyu Zeng
Rui Zhao
LLMAG
LM&Ro
68
33
0
07 Aug 2023
Embedding-based Retrieval with LLM for Effective Agriculture Information Extracting from Unstructured Data
Ruoling Peng
Kang Liu
Po Yang
Zhipeng Yuan
Shunbao Li
35
28
0
06 Aug 2023
CHORUS: Foundation Models for Unified Data Discovery and Exploration
Moe Kayali
A. Lykov
Ilias Fountalis
N. Vasiloglou
Dan Olteanu
Dan Suciu
53
22
0
16 Jun 2023
Large Language Models as Tool Makers
Tianle Cai
Xuezhi Wang
Tengyu Ma
Xinyun Chen
Denny Zhou
LLMAG
53
200
0
26 May 2023
Enabling and Analyzing How to Efficiently Extract Information from Hybrid Long Documents with LLMs
C. Yue
Xinru Xu
Xiaojun Ma
Lun Du
Hengyu Liu
Zhiming Ding
Yanbing Jiang
Shi Han
Dongmei Zhang
37
4
0
24 May 2023
From Words to Code: Harnessing Data for Program Synthesis from Natural Language
Anirudh Khatry
Joyce Cahoon
Jordan Henkel
Shaleen Deep
Venkatesh Emani
...
Vu Le
Mohammad Raza
Sherry Shi
Mukul Singh
A. Tiwari
67
12
0
02 May 2023
FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU
Ying Sheng
Lianmin Zheng
Binhang Yuan
Zhuohan Li
Max Ryabinin
...
Joseph E. Gonzalez
Percy Liang
Christopher Ré
Ion Stoica
Ce Zhang
175
383
0
13 Mar 2023
Construction of Knowledge Graphs: State and Challenges
Marvin Hofer
Daniel Obraczka
A. Saeedi
Hanna Köpcke
Erhard Rahm
44
33
0
22 Feb 2023
ChatGPT: Jack of all trades, master of none
Jan Kocoñ
Igor Cichecki
Oliwier Kaszyca
Mateusz Kochanek
Dominika Szydło
...
Maciej Piasecki
Lukasz Radliñski
Konrad Wojtasik
Stanislaw Wo'zniak
Przemyslaw Kazienko
AI4MH
77
546
0
21 Feb 2023
Demonstrate-Search-Predict: Composing retrieval and language models for knowledge-intensive NLP
Omar Khattab
Keshav Santhanam
Xiang Lisa Li
David Leo Wright Hall
Percy Liang
Christopher Potts
Matei A. Zaharia
RALM
KELM
66
260
0
28 Dec 2022
DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation
Yuhang Lai
Chengxi Li
Yiming Wang
Tianyi Zhang
Ruiqi Zhong
Luke Zettlemoyer
Scott Yih
Daniel Fried
Si-yi Wang
Tao Yu
ELM
ALM
74
323
0
18 Nov 2022
Ask Me Anything: A simple strategy for prompting language models
Simran Arora
A. Narayan
Mayee F. Chen
Laurel J. Orr
Neel Guha
Kush S. Bhatia
Ines Chami
Frederic Sala
Christopher Ré
ReLM
LRM
245
211
0
05 Oct 2022
Operationalizing Machine Learning: An Interview Study
Shreya Shankar
Rolando Garcia
J. M. Hellerstein
Aditya G. Parameswaran
91
51
0
16 Sep 2022
Large Language Models are Few-Shot Clinical Information Extractors
Monica Agrawal
S. Hegselmann
Hunter Lang
Yoon Kim
David Sontag
BDL
LM&MA
199
343
0
25 May 2022
A Survey on Neural Open Information Extraction: Current Status and Future Directions
Shaowen Zhou
Yu Bowen
Aixin Sun
Cheng Long
Jingyang Li
Haiyang Yu
Jianguo Sun
Yongbin Li
67
32
0
24 May 2022
Can Foundation Models Wrangle Your Data?
A. Narayan
Ines Chami
Laurel J. Orr
Simran Arora
Christopher Ré
LMTD
AI4CE
212
214
0
20 May 2022
Language Models in the Loop: Incorporating Prompting into Weak Supervision
Ryan Smith
Jason Alan Fries
Braden Hancock
Stephen H. Bach
76
56
0
04 May 2022
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Xuezhi Wang
Jason W. Wei
Dale Schuurmans
Quoc Le
Ed H. Chi
Sharan Narang
Aakanksha Chowdhery
Denny Zhou
ReLM
BDL
LRM
AI4CE
489
3,486
0
21 Mar 2022
Reasoning over Public and Private Data in Retrieval-Based Systems
Simran Arora
Patrick Lewis
Angela Fan
Jacob Kahn
Christopher Ré
42
23
0
14 Mar 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
728
12,525
0
04 Mar 2022
A Survey on Retrieval-Augmented Text Generation
Huayang Li
Yixuan Su
Deng Cai
Yan Wang
Lemao Liu
RALM
104
205
0
02 Feb 2022
DOM-LM: Learning Generalizable Representations for HTML Documents
Xiang Deng
Prashant Shiralkar
Colin Lockard
Binxuan Huang
Huan Sun
AI4TS
AI4CE
61
37
0
25 Jan 2022
A General Language Assistant as a Laboratory for Alignment
Amanda Askell
Yuntao Bai
Anna Chen
Dawn Drain
Deep Ganguli
...
Tom B. Brown
Jack Clark
Sam McCandlish
C. Olah
Jared Kaplan
ALM
103
762
0
01 Dec 2021
AI Chains: Transparent and Controllable Human-AI Interaction by Chaining Large Language Model Prompts
Tongshuang Wu
Michael Terry
Carrie J. Cai
LLMAG
AI4CE
LRM
67
452
0
04 Oct 2021
Can Deep Neural Networks Predict Data Correlations from Column Names?
Immanuel Trummer
47
8
0
09 Jul 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
404
2,051
0
31 Dec 2020
Interactive Weak Supervision: Learning Useful Heuristics for Data Labeling
Benedikt Boecking
Willie Neiswanger
Eric Xing
A. Dubrawski
NoLa
OffRL
53
69
0
11 Dec 2020
1
2
Next