No GPU option in digits-devserver

Hi,

While Creating Image Classification Model with DIGITS, I don’t have an option to select a GPU.
What’s the problem?

Thanks, Yoni.

Hi mandel.yonatan, what are the specifications of the system you are running DIGITS on?

Hi,

Ububtu 16.04 LTS
Intel core i7-6700 CPU @ 3.40GHz*4
GEFORCE GT 730/PCIe/SSE2
64bit

Thank you.

Have you installed cuDNN and nvcaffe-0.15 ok? DIGITS uses them underneath to access the GPU.

Yes, according to your Github:

How can I check the installation went ok?

If it went correctly, and your GPU is supported (not entirely sure about GT 730 2GB), normally DIGITS should just start with GPU.
Baring that, you can navigate a terminal into your caffe build tree, and try running:

$ make runtest

Generally speaking, you’ll want a training GPU with typically at least 6GB memory to use on DNNs like Alexnet/Googlenet/Resnet on ImageNet / COCO.
You may be able change the Batch Sizes and Batch Accumulation hyperparameters around, see this step of the tutorial for reference.

[b]Hi,

In a terminal I navigated /caffe/build$ and ran: make runtest
I don’t think it helped, since I still don’t have an option to select a GPU in the “New image classification model” in Digits.
However, when I press create, the following appears:[/b]

Job Status Running
Initialized at 09:31:01 AM (1 second)
Running at 09:31:02 AM
Train Caffe Model Running
0%
Estimated time remaining: ?

Initialized at 09:31:01 AM (1 second)
Running at 09:31:02 AM
Hardware
GeForce GT 730 (#0)
Memory
1.2 GB / 1.95 GB (61.4%)
Temperature
64 °C
Process #6171
CPU Utilization
112.9%
Memory
900 MB (11.4%)

But after 10 minutes the error appears:

Train Caffe Model Error
Initialized at 09:13:51 AM (1 second)
Running at 09:13:52 AM (11 minutes, 13 seconds)
Error at 09:25:06 AM
(Total - 11 minutes, 15 seconds)
ERROR: Out of memory: failed to allocate 12845056 bytes on device 0
This network produces output loss2/accuracy-top5
This network produces output loss2/loss
Network initialization done.
Solver scaffolding done.
Starting Optimization
Solving
Learning Rate Policy: step
Iteration 0, Testing net (#0)
Ignoring source layer train-data
Ignoring source layer label_train-data_1_split
Test net output #0: accuracy = 0.0214326
Test net output #1: accuracy-top5 = 0.7368
Test net output #2: loss = 2.27526 (* 1 = 2.27526 loss)
Test net output #3: loss1/accuracy = 0.0955242
Test net output #4: loss1/accuracy-top5 = 0.697909
Test net output #5: loss1/loss = 4.35742 (* 0.3 = 1.30723 loss)
Test net output #6: loss2/accuracy = 0.0842165
Test net output #7: loss2/accuracy-top5 = 0.703586
Test net output #8: loss2/loss = 2.27488 (* 0.3 = 0.682465 loss)
Out of memory: failed to allocate 12845056 bytes on device 0

[b]
Is there anything else I can check or must I switch to a newer GPU?

Thanks, Yoni.[/b]

Hi Dustin,

By changing the Batch size to 2 and the Accumulation to 5 it solved the problem (though I still didn’t have the option to choose the GPU in the Digits). Now it’s running.

By the way, do you think that a GEFORCE 940MX (2GB) is more capable than the GT 730 (2 GB)?
How important is the memory size (2 Vs 4 GB)?

Thank you, Yoni.

OK great, technically you should be able to train this way although it may not be completely ideal.

It seems that although the GPU selection menu isn’t listing your card, from the Job Status output it’s still detecting it correctly - it should be ok since you have 1 GPU. That menu is really for selecting from multiple GPUs.

The memory capacity is typically very important to training, without enough memory some networks you may not be able to complete the job.
For upgrading a lower-end card I would recommend GeForce 1060 6GB. Here is one example compact version.

Thank you very much Dustin.