Gate sizing and buffer insertion are crucial for VLSI physical optimization; however, conventional decoupled approaches often yield suboptimal solutions due to uncoordinated resource allocation. Existing simultaneous methods resort to oversimplified timing models or heuristic assumptions, failing to unify the two tasks mathematically rigorously. We present a differentiable physical optimization framework integrating both techniques with GPU acceleration. Key innovations include timing-aware buffer tree skeleton construction, physics-aware modeling, and discrete-aware optimization algorithms. Experiments demonstrate 23% total negative slack (TNS) improvement and 12% worst negative slack (WNS) improvement with similar power consumption and 30× speedup versus CPU-based optimization flow. This work establishes a new paradigm for cooptimizing interdependent physical design tasks with rigorous modeling and efficient computation.