ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.16265
29
8

UNITE: A Unified Benchmark for Text-to-SQL Evaluation

25 May 2023
Wuwei Lan
Zhiguo Wang
Anuj Chauhan
Henghui Zhu
Alexander Hanbo Li
Jiang Guo
Shenmin Zhang
Chung-Wei Hang
Joseph Lilien
Yiqun Hu
Lin Pan
Mingwen Dong
Jun Wang
Jiarong Jiang
Stephen M. Ash
Vittorio Castelli
Patrick Ng
Bing Xiang
    ELM
    LMTD
ArXivPDFHTML
Abstract

A practical text-to-SQL system should generalize well on a wide variety of natural language questions, unseen database schemas, and novel SQL query structures. To comprehensively evaluate text-to-SQL systems, we introduce a UNIfied benchmark for Text-to-SQL Evaluation (UNITE). It is composed of publicly available text-to-SQL datasets, containing natural language questions from more than 12 domains, SQL queries from more than 3.9K patterns, and 29K databases. Compared to the widely used Spider benchmark, we introduce ∼\sim∼120K additional examples and a threefold increase in SQL patterns, such as comparative and boolean questions. We conduct a systematic study of six state-of-the-art (SOTA) text-to-SQL parsers on our new benchmark and show that: 1) Codex performs surprisingly well on out-of-domain datasets; 2) specially designed decoding methods (e.g. constrained beam search) can improve performance for both in-domain and out-of-domain settings; 3) explicitly modeling the relationship between questions and schemas further improves the Seq2Seq models. More importantly, our benchmark presents key challenges towards compositional generalization and robustness issues -- which these SOTA models cannot address well. Our code and data processing script are available at https://github.com/awslabs/unified-text2sql-benchmark

View on arXiv
Comments on this paper