The ever-increasing power of graphics processing units (GPUs) has opened new opportunities for accelerating static timing analysis (STA) to a new milestone. Developing a CPU-GPU parallel STA engine is an extremely challenging job. We need to consider the unique problem characteristics of STA and distinct performance models between CPU and GPU, both of which require very strategic decomposition to benefit from heterogeneous parallelism. In this paper, we propose an efficient implementation for accelerating STA on a GPU.We leverage task-based approaches to decompose the STA workload into CPU-GPU dependent tasks where kernel computation and data processing overlap effectively. We develop GPU-efficient data structures and high-performance kernels to speed up various tasks of STA including levelization, delay calculation, and graph update. Our acceleration framework is flexible and adaptive. When tasks are scarce such as incremental timing, we run the normal CPU mode, and we enable GPU when tasks are massive. We have implemented our algorithms on top of OpenTimer and demonstrated promising performance speed-up on large designs. As an example, we achieved up to 3.69× speed-up on a large design of 1.6M gates and 1.6M nets using one GPU.