Common path pessimism removal (CPPR) is imperative for eliminating redundant pessimism during static timing analysis (STA). However, turning on CPPR can significantly increase the analysis runtime by 10-100× in large designs. Recent years have seen much research on improving the algorithmic efficiencies of CPPR, but most are architecturally constrained by either the speed-accuracy trade-off or design-specific pruning heuristics. In this paper, we introduce a novel CPPR algorithm that is provably good and practically efficient. We have evaluated our algorithm on large industrial designs and demonstrated promising performance over the current state-of-the-art. As an example, our algorithm outperforms the baseline by 36-135× faster when generating the top-10K post-CPPR critical paths on a million-gate design. At the extreme, our algorithm with one core is even 4-16× faster than the baseline with 8 cores. Our algorithm also outperforms the commercial STA engine PrimeTime up to 26.99× faster. By exploiting parallelism within the circuit graph, we can reduce the memory consumption of our algorithm by 30%, with only 3% runtime increase.