Graph invariant learning (GIL) seeks invariant relations between graphs and labels under distribution shifts. Recent works try to extract an invariant subgraph to improve out-of-distribution (OOD) generalization, yet existing approaches either lack explicit control over compactness or rely on hard top- $k$ selection that shrinks the solution space and is only partially differentiable. In this paper, we provide an in-depth analysis of the drawbacks of some existing works and propose a few general principles for invariant subgraph extraction: 1) separability, as encouraged by our sparsity-driven mechanism, to filter out the irrelevant common features; 2) softness, for a broader solution space; and 3) differentiability, for a soundly end-to-end optimization pipeline. Specifically, building on optimal transport, we propose Graph Sinkhorn Attention (GSINA), a fully differentiable, cardinality-constrained attention mechanism that assigns sparse-yet-soft edge weights via Sinkhorn iterations and induces node attention. GSINA provides explicit controls for separability and softness, and uses a Gumbel reparameterization to stabilize training. It convergence behavior is also theoretically studied. Extensive empirical experimental results on both synthetic and real-world

View on arXiv

Comments on this paper