ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2204.13304
23
3
v1v2 (latest)

Mat2Stencil: A Modular Matrix-Based DSL for Explicit and Implicit Matrix-Free PDE Solvers on Structured Grid

28 April 2022
Huanqi Cao
Shizhi Tang
Qianchao Zhu
Bowen Yu
Wenguang Chen
ArXiv (abs)PDFHTML
Abstract

Partial differential equation (PDE) solvers are extensively utilized across numerous scientific and engineering fields. However, achieving high performance and scalability often necessitates intricate and low-level programming, particularly when leveraging deterministic sparsity patterns in structured grids. In this paper, we propose an innovative domain-specific language (DSL), Mat2Stencil, with its compiler, for PDE solvers on structured grids. Mat2Stencil introduces a structured sparse matrix abstraction, facilitating modular, flexible, and easy-to-use expression of solvers across a broad spectrum, encompassing components such as Jacobi or Gauss-Seidel preconditioners, incomplete LU or Cholesky decompositions, and multigrid methods built upon them. Our DSL compiler subsequently generates matrix-free code consisting of generalized stencils through multi-stage programming. The code allows spatial loop-carried dependence in the form of quasi-affine loops, in addition to the Jacobi-style stencil's embarrassingly parallel on spatial dimensions. We further propose a novel automatic parallelization technique for the spatially dependent loops, which offers a compile-time deterministic task partitioning for threading, calculates necessary inter-thread synchronization automatically, and generates an efficient multi-threaded implementation with fine-grained synchronization. Implementing 4 benchmarking programs, 3 of them being the pseudo-applications in NAS Parallel Benchmarks with 6.3%6.3\%6.3% lines of code and 1 being matrix-free High Performance Conjugate Gradients with 16.4%16.4\%16.4% lines of code, we achieve up to 1.67×1.67\times1.67× and on average 1.03×1.03\times1.03× performance compared to manual implementations.

View on arXiv
Comments on this paper