Hi,
I met an issue where cuMemcpyDtoHAsync
does not return forever when some condition is met in high GPU load.
I have succeeded to minimize the program and uploaded the reproducer:
The repro is like the following:
- Initialize a CUDA context.
cuMemAlloc
cuArray3DCreate
-
cuMemcpy3DAsync
(D to H) -
cuMemcpy3DAsync
(H to D) cuTexObjectCreate
-
cuMemcpyDtoHAsync
hang here!!
That’s it.
I found that the issue rarely happens without high GPU load.
Recently my machine is running Folding@home in background. If I stop the F@h task, the issue seems hardly happens.
Other random notes:
- Repro rate is around 80% (high variation) in my environment with background F@h.
- There are redundant
cuCtxSetCurrent
s but If I remove those, I feel the repro rate decreases. (Possibly my imagination) - I can’t identify how which part of the program affects the issue more.
- In the case I use
cuMemcpyDtoH
instead, the issue still happens.
Is this a CUDA’s issue or do I do something illegal?
My environment:
Windows 10 20H2
Core i9-9900K, 32GB DDR4
CUDA 11.1
Geforce RTX 3080
NVIDIA Driver 461.09
Thanks,