ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.14583
21
0

Synthetic Data Augmentation for Table Detection: Re-evaluating TableNet's Performance with Automatically Generated Document Images

17 June 2025
Krishna Sahukara
Zineddine Bettouche
Andreas Fischer
    LMTDViT
ArXiv (abs)PDFHTML
Main:4 Pages
6 Figures
Bibliography:1 Pages
3 Tables
Abstract

Document pages captured by smartphones or scanners often contain tables, yet manual extraction is slow and error-prone. We introduce an automated LaTeX-based pipeline that synthesizes realistic two-column pages with visually diverse table layouts and aligned ground-truth masks. The generated corpus augments the real-world Marmot benchmark and enables a systematic resolution study of TableNet. Training TableNet on our synthetic data achieves a pixel-wise XOR error of 4.04% on our synthetic test set with a 256x256 input resolution, and 4.33% with 1024x1024. The best performance on the Marmot benchmark is 9.18% (at 256x256), while cutting manual annotation effort through automation.

View on arXiv
@article{sahukara2025_2506.14583,
  title={ Synthetic Data Augmentation for Table Detection: Re-evaluating TableNet's Performance with Automatically Generated Document Images },
  author={ Krishna Sahukara and Zineddine Bettouche and Andreas Fischer },
  journal={arXiv preprint arXiv:2506.14583},
  year={ 2025 }
}
Comments on this paper