ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.10341
11
1

Scalable Econometrics on Big Data -- The Logistic Regression on Spark

18 June 2021
Aurelien Ouattara
Matthieu Bulté
Wanxuan Lin
Philipp Scholl
Benedikt Veit
Christos Ziakas
Florian Felice
Julien Virlogeux
George N. Dikos
ArXivPDFHTML
Abstract

Extra-large datasets are becoming increasingly accessible, and computing tools designed to handle huge amount of data efficiently are democratizing rapidly. However, conventional statistical and econometric tools are still lacking fluency when dealing with such large datasets. This paper dives into econometrics on big datasets, specifically focusing on the logistic regression on Spark. We review the robustness of the functions available in Spark to fit logistic regression and introduce a package that we developed in PySpark which returns the statistical summary of the logistic regression, necessary for statistical inference.

View on arXiv
Comments on this paper