We present a deterministic sorting algorithm, SPMS (Sample, Partition, and Merge Sort), that interleaves the partitioning of a sample sort with merging. Sequentially, it sorts elements in time cache-obliviously with an optimal number of cache misses. The parallel complexity (or critical path length) of the algorithm is , which improves on previous bounds for optimal cache oblivious sorting. The algorithm also has low false sharing costs. When scheduled by a work-stealing scheduler in a multicore computing environment with a global shared memory and cores, each having a cache of size organized in blocks of size , the costs of the additional cache misses and false sharing misses due to this parallel execution are bounded by the cost of and cache misses respectively, where is the number of steals performed during the execution. Finally, SPMS is resource oblivious in Athat the dependence on machine parameters appear only in the analysis of its performance, and not within the algorithm itself.
View on arXiv