Problem with RAM usage and Caffe models on the Jetson TX1

Hi,

I am testing two caffe models (VGG_FACES and ResNet-50) with the Jetson TX1 (32 Bits, L4T, R24). I find that after the model weights are loaded into memory (line 15) the call to ‘net.forward()’ significantly increases the amount of memory used by the Jetson (line 40).

This is the example code I use to test the memory performance of two models

import numpy as np
import caffe

run_model = 2
im_size = 224
if run_model == 1:
    model_def = '/media/ubuntu/9016-4EF8/caffe/models/ResNet/ResNet-50-deploy.prototxt'
    model_weights = '/media/ubuntu/9016-4EF8/caffe/models/ResNet/ResNet-50-model.caffemodel'
elif run_model == 2:
    model_def = '/media/ubuntu/9016-4EF8/vgg_face_caffe/VGG_FACE_deploy.prototxt'
    model_weights = '/media/ubuntu/9016-4EF8/vgg_face_caffe/VGG_FACE.caffemodel'

caffe.set_device(0)
caffe.set_mode_gpu()
net = caffe.Net(model_def, model_weights, caffe.TEST)

# load the mean ImageNet image (as distributed with Caffe) for subtraction
mu = np.load('./caffe-fast-rcnn/python/caffe/imagenet/ilsvrc_2012_mean.npy')
mu = mu.mean(1).mean(1)  # average over pixels to obtain the mean (BGR) pixel values
print 'mean-subtracted values:', zip('BGR', mu)

# create transformer for the input called 'data'
transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape})
transformer.set_transpose('data', (2, 0, 1))  # move image channels to outermost dimension
transformer.set_mean('data', mu)            # subtract the dataset-mean value in each channel
transformer.set_raw_scale('data', 255)      # rescale from [0, 1] to [0, 255]
transformer.set_channel_swap('data', (2, 1, 0))  # swap channels from RGB to BGR

# set the size of the input (we can skip this if we're happy
#  with the default; we can also change it later, e.g., for different batch sizes)
net.blobs['data'].reshape(1,        # batch size
                          3,         # 3-channel (BGR) images
                          im_size, im_size)  # image size is 227x227

image = caffe.io.load_image('./caffe-fast-rcnn/examples/images/cat.jpg')
transformed_image = transformer.preprocess('data', image)
net.blobs['data'].data[...] = transformed_image  # copy the image data into the memory allocated for the net

### perform classification
output = net.forward()

output_prob = output['prob'][0]  # the output probability vector for the first image in the batch
print 'predicted class is:', output_prob.argmax()

After the models are loaded into memory, an extra 600 MB of memory are allocated during the forward pass for VGG_FACES and 300 MB for ResNet-50. This brings the total RAM used by the Jetson at 2.2 GB and 1.7 GB, respectively (note that I am not running any other scripts).I made a table with the comparison for the two models:

Model Name     Disk Space       RAM model load (line 15)   RAM at forward pass (line 40)  
VGG FACES       580 MB	            1 GB                         1.6 GB                                 
ResNet-50	102 MB	            600 MB                       900 MB 

Model Name    Total RAM         RAM after program ends     Ram after cleaning cache
VGG FACES       2.2 GB               1.1 GB 			~600 MB
ResNet-50       1.7 GB               1.1 GB			~600MB

This is a limiting factor to my project and, I feel, a waste of RAM resources. I’ve searched online but I could not find a satisfactory answer to this problem.

Why does this increase in memory occur? If I am in GPU mode, shouldn’t the unified memory of the Jetson avoid this memory increase?

Also, when the program finishes execution, I’ve noticed that RAM does not go back to the initial ~500 / 600 MB used by the operating system. Instead, it stays at ~1.1GB. When I manually do

echo 3 | sudo tee /proc/sys/vm/drop_caches3

to release the cache, then RAM goes down again to ~600 MB.

Is this also an expected behavior from the Jetson memory management? Am I missing something obvious here (sorry but I am just starting with CUDA programming)?

Thank you very much in advance

Cheers

Hello, Lisan:
In Jetson, CPU and GPU share the global DRAM. So even in GPU mode, CUDA-accelerated program will occupy global memory.

As to drop_caches, that’s Linux policy, which has nothing to do with Jetson.

br
ChenJian