FFMPEG transcoding processes are stuck for long time making GPU unusable

I am running nvidia-driver 410 with CUDA 10 on 1080ti. I have periodic video transcoding process(ffmpeg) that run on the GPU. After running for sometime, these process are stuck and do not complete. While trying to spawn a new transcoding session, I get an error saying that no GPU available.

The only solution for this has been to unload and reload the nvidia kernel after killing all the stuck processes. The logs show a XID31 error, which is “a GPU memory page fault”. I am not sure if this is a driver issue ( I have tried with other nvidia-396 and cuda9.2 and get the same error).

Any ideas on how to proceed with debugging?

This is the description I see for Xid 31:

https://docs.nvidia.com/deploy/xid-errors/index.html#topic_5_2

I tried to run a transcoding session with a sample .ts file. cuda-memcheck throws an exception

"Program hit CUDA_ERROR_INVALID_CONTEXT (error 201) due to “invalid device context” on CUDA API call to cuCtxSynchronize.

Any ideas why this might be happening?

Attaching a debugger to the code just hangs the process forever