FCN-Alexnet-Aerial-FPV-720p Segmentation Out of Memory Error

Hi, sorry in advance if this is not posted in the correct forum, there are many but I am posting here because I’m following the tutorial for the inference with the TX2.

My issue is that when I get to the Segmentation Model portion of the tutorial https://github.com/dusty-nv/jetson-inference/blob/master/README.md
I get the following error when trying to create the model:
ERROR: Out of memory: failed to allocate 86526804 bytes on device 0

I set the Batch Size and Batch Accumulation to both 1, and otherwise the configuration is as given on the website. I have a GeForce 1060 GPU with 3GB of RAM. I know this is on the lower end of modern GPUs, but I was wondering if this Segmentation task is beyond the capability of the GPU I have. Thanks in advance for any info. I can post logs as needed if those would provide insight.


Do you run others application on GPU at the same time?
Could you also share the nvidia-smi data to us?



No, I am not running any other GPU applications at the same time. I rebooted and only ran the DIGITS application when it gave the error.

Here is the output for nvidia-smi:

Mon Sep 25 02:28:21 2017       
| NVIDIA-SMI 384.66                 Driver Version: 384.66                    |
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  GeForce GTX 106...  Off  | 00000000:01:00.0  On |                  N/A |
| 38%   32C    P8     6W / 120W |    683MiB /  3010MiB |      0%      Default |
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|    0      1019    G   /usr/lib/xorg/Xorg                             374MiB |
|    0      1705    G   compiz                                         115MiB |
|    0      2323    G   ...el-token=EA5CE8AA7C77308F990510CED881B995   130MiB |
|    0      2628    C   python2                                         59MiB |

Hi, depending on which segnet command you are running, you could try from terminal or SSH without X11 running (from your nvidia-smi output, looks like that could save you memory and if segnet then worked, would probably confirm it was related to the memory capacity). Normally the inference part of the tutorial is run on TX2.

Can you post the full command and terminal log of the run where you experienced this error?

Pardon my ignorance, but I’m not sure how to run the DIGITS Segmentation Model command from command line. If there’s a place to point me to, I can check it out and figure it out. For clarification, I’m running the section with the first screenshot.

I was running this section on the training computer.

Also posted is a screen shot of the error, and the DIGITS terminal output below.

mjpark@mjpark-MS-7A70:~/workspace/DIGITS$ ./digits-devserver 
  ___ ___ ___ ___ _____ ___
 |   \_ _/ __|_ _|_   _/ __|
 | |) | | (_ || |  | | \__ \
 |___/___\___|___| |_| |___/ 6.0.0-rc.1

Tensorflow support disabled.
2017-09-25 20:42:25 [INFO ] Loaded 5 jobs.
2017-09-25 20:45:38 [20170925-204537-884c] [WARNING] Ignoring data_param.backend ...
2017-09-25 20:45:38 [20170925-204537-884c] [WARNING] Ignoring data_param.backend ...
2017-09-25 20:45:38 [20170925-204537-884c] [WARNING] Ignoring data_param.backend ...
2017-09-25 20:45:38 [20170925-204537-884c] [WARNING] Ignoring data_param.backend ...
2017-09-25 20:45:38 [20170925-204537-884c] [DEBUG] Network sanity check - train
2017-09-25 20:45:38 [20170925-204537-884c] [DEBUG] Network sanity check - val
2017-09-25 20:45:38 [20170925-204537-884c] [DEBUG] Network sanity check - deploy
2017-09-25 20:45:38 [20170925-204537-884c] [INFO ] Train Caffe Model task started.
2017-09-25 20:45:38 [20170925-204537-884c] [INFO ] Task subprocess args: "/home/mjpark/workspace/caffe/build/tools/caffe train --solver=/home/mjpark/workspace/DIGITS/digits/jobs/20170925-204537-884c/solver.prototxt --gpu=0 --weights=/home/mjpark/workspace/DIGITS/examples/semantic-segmentation/fcn_alexnet.caffemodel"
2017-09-25 20:46:10 [20170925-204537-884c] [ERROR] Train Caffe Model: Out of memory: failed to allocate 77414400 bytes on device 0
2017-09-25 20:46:10 [20170925-204537-884c] [ERROR] Train Caffe Model task failed with error code -6

I see, above I was referencing the TensorRT program used for inferencing. I understand now you are talking about DIGITS. Still you could try killing the display to free more VRAM for DIGITS. Otherwise it would seem 3GB RAM not enough for training FCN-Alexnet (although the 1060 6GB variant may be sufficient).

Yes, I tried just that, I pressed “Create” and then closed the browser to see if that helped. It still didn’t work.
I just ordered the 1060 6GB version. Will try again when that one gets in and is installed. Hopefully that should do the trick.
Thanks for the feedback.

The web browser, which can be run from any machine on your local network, is unrelated to the X11 desktop running on your DIGITS machine. DIGITS is a web server and doesn’t need the desktop to run. From your nvidia-smi output, Xorg and compiz are taking up 500MB of GPU memory. You could save that memory by killing X11 and starting DIGITS from terminal (then use a laptop/ect. for the web browser). Depending on the version of Ubuntu you’re using, the command to kill the desktop is similar to ‘sudo service lightdm stop’

Let us know how it goes with the 1060 6GB version.

I got the 1060 6GB GeForce card in last week. With this new card, I was able to run the segmentation routine in DIGITS. So looks like my issue was simply performance / RAM. Thanks for the tip.

OK great, thanks for letting us know!