ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.08455
22
52

Document Understanding Dataset and Evaluation (DUDE)

15 May 2023
Jordy Van Landeghem
Rubèn Pérez Tito
Łukasz Borchmann
Michal Pietruszka
Pawel Józiak
Rafal Powalski
Dawid Jurkiewicz
Mickael Coustaty
Bertrand Ackaert
Ernest Valveny
Matthew Blaschko
Sien Moens
Tomasz Stanislawek
    VGen
ArXivPDFHTML
Abstract

We call on the Document AI (DocAI) community to reevaluate current methodologies and embrace the challenge of creating more practically-oriented benchmarks. Document Understanding Dataset and Evaluation (DUDE) seeks to remediate the halted research progress in understanding visually-rich documents (VRDs). We present a new dataset with novelties related to types of questions, answers, and document layouts based on multi-industry, multi-domain, and multi-page VRDs of various origins, and dates. Moreover, we are pushing the boundaries of current methods by creating multi-task and multi-domain evaluation setups that more accurately simulate real-world situations where powerful generalization and adaptation under low-resource settings are desired. DUDE aims to set a new standard as a more practical, long-standing benchmark for the community, and we hope that it will lead to future extensions and contributions that address real-world challenges. Finally, our work illustrates the importance of finding more efficient ways to model language, images, and layout in DocAI.

View on arXiv
Comments on this paper