Need help for this interesting but very annoying problem for multi GPUs

Hello Everyone, hope someone may help me for the problem stated below.

OS: Windows 10 (64bit) IDE: Visual Studio

I got 4 GTX 1080Ti installed with deviceID: 0,1,2,3.

For a same set of CUDA code, I use cudaSetDevice(deviceID) to assign this job to different GPUs. GPU 0, 2 and 3 works perfectly. However, GPU with deviceID=1 always run into a problem shown in the terminal:

a long list of “CURAND_GENERATE() failed!”
then followed by
“CUFFT error: Plan creation failed CUFFT error: ExecC2C failed CUFFT error: Failed to synchronize”

Can someone tell me what happens to this GPU? Why, for same code, 0,2,3 work well. But not GPU 1.

I am really frustrating. This problem has troubled me for a long time. Based on my checking, everything on these 4 GPUs are same.

Thank you, all!

It could be a power issue perhaps. Run nvidia-smi and check that GPU is not an error state. It could be a power-related issue – perhaps the power cables leading to that GPU are loose. Try re-seating them with the system powered down. It could also be that your motherboard has some issue on that slot, or that that particular GPU is bad. This is where troubleshooting one thing at a time helps.

Hello Vacaloca,

Thank you, I also use this GPU to do displaying on the monitor. The monitor works fun with this GPU but not the CUDA code. Will this remove the concerns about GPU itself?