I’ve installed CUDA and Caffe without errors. While running
make runtest
in caffe source dir, I keep getting
Check failed: error == cudaSuccess (10 vs. 0) invalid device ordinal
error. After running ./deviceQuery in cuda directory, I got a very similar error:
cudaGetDeviceCount returned 10
-> invalid device ordinal
After even more googling, I ran nvidia-smi to get
Unable to determine the device handle for GPU 0000:04:00.0: Unable to communicate with GPU because it is insufficiently powered.
This may be because not all required external power cables are
attached, or the attached cables are not seated properly.
I contacted sysadmin and he told me all cables are properly connected, but te problem remained. I tried nvidia-debugdump -l to get
Found 2 NVIDIA devices
Device ID: 0
Device name: NVS 315 (*PrimaryCard)
GPU internal ID: GPU-fc8b8a6f-c28f-9860-1469-453ea6a4abb0
Error: nvmlDeviceGetHandleByIndex(): Insufficient External Power
FAILED to get details on GPU (0x1): Insufficient External Power
and tried changing driver with update-alternatives --config x86_64-linux-gnu_gl_conf:
Selection Path Priority Status
------------------------------------------------------------
* 0 /usr/lib/nvidia-352/ld.so.conf 8604 auto mode
1 /usr/lib/nvidia-352-prime/ld.so.conf 8603 manual mode
2 /usr/lib/nvidia-352/ld.so.conf 8604 manual mode
3 /usr/lib/x86_64-linux-gnu/mesa/ld.so.conf 500 manual mode
Press enter to keep the current choice[*], or type selection number: 1
update-alternatives: using /usr/lib/nvidia-352-prime/ld.so.conf to provide /etc/ld.so.conf.d/x86_64-linux-gnu_GL.conf (x86_64-linux-gnu_gl_conf) in manual mode
but deviceQuery keeps returning the same error.
I’d appreciate any suggestions on the matter.