Decoder performance cannot be used to 100% with Quadro RTX 4000

When I use Nvidia GPU Quadro RTX 4000 for decoding performance test, decoder cannot be used to 100% for single process no matter how many threads are used. It seems that it only can be used to about 70% for single process. But when I start another process at the same time, it can reach to about 90%, but still cannot be 100%. Continue to start the third process, it can reach to about 95% now. The operating system I use is “CentOS Linux release 7.6.1810”. Nvidia driver version is 440.82, CUDA version is 10.2.