Prior GPU-accelerated Static Timing Analysis (GPU-STA) works all struggle to find industrial adoption, primarily because they aim to build standalone timing engines that can never emulate the proprietary delay models used in commercial tools. In this paper, we adopt a different philosophy by presenting INSTA, the first-ever differentiable, statistical GPU-STA engine that achieves unprecedented accuracy and scalability by a one-time initialization from any reference tool, bringing two transformative capabilities to Physical Design (PD): (1) rapid, high-fidelity timing analysis for incremental netlist update, and (2) gradient-based truly-global timing optimization at scale. Notably, INSTA demonstrates a near-perfect 0.999 correlation with an industry-leading signoff tool on a 15-million-pin design in a commercial 3nm node with runtime under 0.1 seconds. Experimental results showcase INSTA’s capability through three PD applications: (1) serving as a fast evaluator in an industrial gate sizing flow, achieving 25x faster incremental update timing runtime with almost no accuracy loss; (2) INSTA-Size, a gradient-based gate sizer that achieves up to 15% better Total Negative Slack (TNS) than the reference signoff engine by sizing 68% fewer amount of cells; and (3) INSTA-Place, a differentiable timing-driven global placer that outperforms the state-of-the-art net-weighting placer by up to 16% in Half-Perimeter Wirelegnth (HPWL) and 59.4% in TNS on the ICCAD’15 benchmark.