Hi,
I’ve been trying to follow the jetson-inference guide, but when training on the DetectNet-COCO-Dog image model, I got an error message soon after starting the training:
[ERROR] Train Caffe Model: Out of memory: failed to allocate 39321600 bytes on device 0
[ERROR] Train Caffe Model task failed with error code -6
Does it mean I need a computer with at least 40 gigs of memory? Will increasing the swap disk partition works in this case? Thanks!
David Huang
Hi uwdaveh, 39321600 bytes is 37.5 MB. Check nvidia-smi to verify your GPU and check how much GPU memory you have free. Are you sure your GPU and DIGITS + Caffe + cuDNN are installed and working correctly? Are you able to train and of the other networks from the tutorial like imageNet or segNet?
If you still have problems training DetectNet, try changing the Batch Size and Batch Accumulation like in this step:
[url]https://github.com/dusty-nv/jetson-inference#selecting-detectnet-batch-size[/url]
Yeah my pacakages are working correctly and imagenet was working fine. I changed the batch size to 1 and batch accumulation to 12, but still got this error:
Train Caffe Model: Out of memory: failed to allocate 19660800 bytes on device 0
My video card is 960m by the way. That 2 GB video memory is probably too low anyways? Thanks for the help!
Can you post the output of nvidia-smi? It will print out the memory usage and the amount of memory available, similar to:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.103 Driver Version: 384.103 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1070 Off | 00000000:01:00.0 On | N/A |
| N/A 46C P2 35W / N/A | 1942MiB / 8112MiB | 5% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1054 G /usr/lib/xorg/Xorg 805MiB |
| 0 1681 G compiz 432MiB |
| 0 8694 G ...-token=09E7EB6B6E411A2691454BFF23EBBF27 360MiB |
| 0 24919 C /usr/bin/python 339MiB |
+-----------------------------------------------------------------------------+
If your baseline GPU usage is significant (note the Xorg display server consuming 805MiB in my case), you may want to try disabling GUI and running headless. You can still launch DIGITS and access it visually from the web browser on another networked machine.