the best way convert nv12 to RGBA

Frame genereted by cuVid, color format is nv12, I neet to convert it to RGBA for later process. In NPP library, i found nppiNV21ToRGB_8u_P2C4R, which only process nv21.

How can I do the convert for best performance? convert nv12 to nv21 first?

I believe the cudaDecodeGL sample code has a function that demonstrates this (NV12 to ARGB)

I tried this sample code, but only about 100fps for 4K at GTX 1070, even slower than libyuv at CPU.
I think GPU should performance much better in such work.

Samples code provided by NVIDIA are generally not optimized for performance, but try to demonstrate important programming concepts with the tightest and clearest code possible. I am reasonably sure NVIDIA states as much in their documentation. Some possible courses of action (this list is not meant to be exhaustive):

(1) Optimize the baseline code provided by NVIDIA
(2) Search for a third party GPU-accelerated library (open source or commercial)
(3) Use a library running on the CPU