CUDA_ERROR_INVALID_DEVICE when trying to created decoder on P4000 on some drivers

This bug has appeared in our product after updating the Nvidia Quadro P4000 driver from 462.31 to version 472.12, 472.47 or 511.09. We load video stream to the GPU to be decoded up to using 80% of the device memory. When the memory usage gets close to 80%, it fails in creating decoder using function cuvidCreateDecoder by error CUDA_ERROR_OUT_OF_MEMORY. Then, if we try again, we will get error CUDA_ERROR_INVALID_DEVICE and then the device will crash and does not work except by restarting it.
The following code shows a while loop trying to create decoder up to 20 times with 50 msec delay. If the memory usage is close to 80%, function cuvidCreateDecoder will often be fail in the first loop iteration with CUDA_ERROR_OUT_OF_MEMORY error. In the second iteration, cuvidCreateDecoder will be fail with CUDA_ERROR_INVALID_DEVICE error and the loop will be terminated. Then the device will stop working and all working decoders will dead as well. It is noticeable that increasing delay and removing the retrying part (removing while loop) do not solve the issue.
const int retryCountMax = 20, sleep = 50;
int retryCount = 0;
CUresult result = CUDA_ERROR_OUT_OF_MEMORY;
while (result == CUDA_ERROR_OUT_OF_MEMORY && retryCount < retryCountMax)
{
result = CmNvidiaSDKFunctionWrapper::cudaCreateDecoder(&tempDecoder, &createInfo);
if (result == CUDA_ERROR_OUT_OF_MEMORY)
{
std::stringstream ss;
ss << "cudaCreateDecoder failed with memory allocation, retry in " << sleep
<< " ms. retry count: " << retryCount;
OutputDebugString(ss.str().c_str());
retryCount++;
Sleep(sleep);
}
}
Here are our output messages that shows after the first CUDA_ERROR_OUT_OF_MEMORY error, the device repeats CUDA_ERROR_INVALID_DEVICE error, so it stops working.
cudaCreateDecoder failed with memory allocation, retry in 50 ms. retry count: 0
cudaCreateDecoder failed, error code = 101, error description = invalid device ordinal
cudaCreateDecoder failed, error code = 101, error description = invalid device ordinal
cudaCreateDecoder failed, error code = 101, error description = invalid device ordinal

Overall, by updating Nvidia Quadro P4000 driver, 2 problems will be happened in creating decoder: 1. Getting CUDA_ERROR_OUT_OF_MEMORY error while there is more than 20% free memory which is enough in our case and 2. Getting CUDA_ERROR_INVALID_DEVICE error in trying to create decoder again, so finally, the device will be out of work completely.

This is CUDA programing related issue, let me move it into CUDA forum. Thanks

sure :-)