Use Ubuntu14.04, GTX 1080, CUDA 8.0RC, updated graphics card driver to nvidia-367 and installed DIGITS 3.
The following error was reported after training a network in DIGITS:
ERROR: Check failed: error == cudaSuccess (8 vs. 0) invalid device function
ip2 needs backward computation.
relu1 needs backward computation.
ip1 needs backward computation.
pool2 needs backward computation.
conv2 needs backward computation.
pool1 needs backward computation.
conv1 needs backward computation.
scale does not need backward computation.
label_mnist_1_split does not need backward computation.
mnist does not need backward computation.
This network produces output accuracy
This network produces output loss
Network initialization done.
Solver scaffolding done.
Starting Optimization
Solving
Learning Rate Policy: step
Iteration 0, Testing net (#0)
Data layer prefetch queue empty
Check failed: error == cudaSuccess (8 vs. 0) invalid device function
Any suggestions as to what I should do to fix this? Thank you.
Invalid device function usually means that the code you are trying to run was not compiled for the GPU architecture that you are trying to run it on.
This shouldn’t be an issue with DIGITS itself (as that is basically a graphical front end for a DL framework) but is presumably arising either out of Caffe (or whatever framework you are using in DIGITS) or the cuDNN library itself.
You could try updating cuDNN to the newest version. This would likely require you to rebuild (at least re-link) Caffe.
If you installed DIGITS and Caffe and cuDNN using a package manager install method, then this may be a substantial amount of work, at least compared to what you have done so far.
Thank you @txbob
Yes i did use package manager to install DIGITS…
Besides the error above, i also cannot see the option to select GPU in the page to create new model (the same section as model name). Any clue why?
How did you install CUDA 8 RC? Did you do any verification of the install?
The problem mentioned in the first post in this thread is actually a problem in the nvidia caffe build (0.14.2) that is pulled with the package manager install of DIGITS 3 (the cuDNN library is OK). The problem should be fixed in some upcoming 0.15 NVcaffe packages.
At this time, some options are:
-
Build the caffe-0.15 branch of NVcaffe manually from source (instructions: DIGITS/BuildCaffe.md at digits-4.0 · NVIDIA/DIGITS · GitHub ), then reconfigure DIGITS 3 to use the new build (instructions: DIGITS/UbuntuInstall.md at digits-3.0 · NVIDIA/DIGITS · GitHub )
-
Wait for the new DIGITS v4 and NVcaffe v0.15 packages to be released. I can’t predict the future but this might happen in the next 30 days or so.
Thank you @txbob, will wait for DIGITS v4 release.