DetectNet-COCO-Dog error

Hi,

I’ve been trying to follow the jetson-inference guide, but when training on the DetectNet-COCO-Dog image model, I got an error message soon after starting the training:

[ERROR] Train Caffe Model: Out of memory: failed to allocate 39321600 bytes on device 0
[ERROR] Train Caffe Model task failed with error code -6

Does it mean I need a computer with at least 40 gigs of memory? Will increasing the swap disk partition works in this case? Thanks!

David Huang

Hi uwdaveh, 39321600 bytes is 37.5 MB. Check nvidia-smi to verify your GPU and check how much GPU memory you have free. Are you sure your GPU and DIGITS + Caffe + cuDNN are installed and working correctly? Are you able to train and of the other networks from the tutorial like imageNet or segNet?

If you still have problems training DetectNet, try changing the Batch Size and Batch Accumulation like in this step:
https://github.com/dusty-nv/jetson-inference#selecting-detectnet-batch-size

Yeah my pacakages are working correctly and imagenet was working fine. I changed the batch size to 1 and batch accumulation to 12, but still got this error:

Train Caffe Model: Out of memory: failed to allocate 19660800 bytes on device 0

My video card is 960m by the way. That 2 GB video memory is probably too low anyways? Thanks for the help!

Can you post the output of nvidia-smi? It will print out the memory usage and the amount of memory available, similar to:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.103                Driver Version: 384.103                   |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1070    Off  | 00000000:01:00.0  On |                  N/A |
| N/A   46C    P2    35W /  N/A |   1942MiB /  8112MiB |      5%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1054      G   /usr/lib/xorg/Xorg                           805MiB |
|    0      1681      G   compiz                                       432MiB |
|    0      8694      G   ...-token=09E7EB6B6E411A2691454BFF23EBBF27   360MiB |
|    0     24919      C   /usr/bin/python                              339MiB |
+-----------------------------------------------------------------------------+

If your baseline GPU usage is significant (note the Xorg display server consuming 805MiB in my case), you may want to try disabling GUI and running headless. You can still launch DIGITS and access it visually from the web browser on another networked machine.