I use FFmpeg5.1 in a linux server with four A10 gpus.
My machine config as follows:
without nvlink
Docker bases on nvidia/cuda:11.3.1-cudnn8-devel-ubuntu18.04
My work flow is to use one gpu (GPU0) for AI inference and then transfer to other gpu (GPU1 and GPU2) for post process and hardware encode. I copy the result of GPU0 to host and GPU1/2 copy the memory from host. Four threads work on GPU1 and four threads work on GPU2
Here is an occasional thread block in FFmpeg when I transfer memory from host to GPU2. The blocking stacks :
My problem is:
- Is it possible that the block is caused by the input param of cuMemcpy2DAsync
- Form the stack , it seems like one GPU2 thread calls sched_yield function but forget to release a rw pthread lock , so other GPU2 threads block as well. Dose the driver 470.57.02 in A10 has the problem in some corner case?
- Do you have tools to help me finding the root cause of the block?