It was Cuda version for Ampere architecture, I changed Cuda 10 Version to Cuda 11 and Cudnn 7.6 to Cuda 8.2. Now is fast less than 30 seconds the first inference, the others inference take 20 FPS aprox (I thought A30 would be more fast)
For reference I read in somewhere where recommend you will use Cuda 10 for Touring architecture and Cuda 11 for Ampere architecture, it worked for me
Environment
GPU Type: NVIDIA A30 Nvidia Driver Version: 512.78 CUDA Version: 11.0 CUDNN Version: 8.2 Operating System + Version: Windows Server 2016 Python Version (if applicable): Python 3.7 TensorFlow Version (if applicable): 2.4.0 (in specific tensorflow-gpu)