Hi,
I am testing two caffe models (VGG_FACES and ResNet-50) with the Jetson TX1 (32 Bits, L4T, R24). I find that after the model weights are loaded into memory (line 15) the call to ‘net.forward()’ significantly increases the amount of memory used by the Jetson (line 40).
This is the example code I use to test the memory performance of two models
import numpy as np
import caffe
run_model = 2
im_size = 224
if run_model == 1:
model_def = '/media/ubuntu/9016-4EF8/caffe/models/ResNet/ResNet-50-deploy.prototxt'
model_weights = '/media/ubuntu/9016-4EF8/caffe/models/ResNet/ResNet-50-model.caffemodel'
elif run_model == 2:
model_def = '/media/ubuntu/9016-4EF8/vgg_face_caffe/VGG_FACE_deploy.prototxt'
model_weights = '/media/ubuntu/9016-4EF8/vgg_face_caffe/VGG_FACE.caffemodel'
caffe.set_device(0)
caffe.set_mode_gpu()
net = caffe.Net(model_def, model_weights, caffe.TEST)
# load the mean ImageNet image (as distributed with Caffe) for subtraction
mu = np.load('./caffe-fast-rcnn/python/caffe/imagenet/ilsvrc_2012_mean.npy')
mu = mu.mean(1).mean(1) # average over pixels to obtain the mean (BGR) pixel values
print 'mean-subtracted values:', zip('BGR', mu)
# create transformer for the input called 'data'
transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape})
transformer.set_transpose('data', (2, 0, 1)) # move image channels to outermost dimension
transformer.set_mean('data', mu) # subtract the dataset-mean value in each channel
transformer.set_raw_scale('data', 255) # rescale from [0, 1] to [0, 255]
transformer.set_channel_swap('data', (2, 1, 0)) # swap channels from RGB to BGR
# set the size of the input (we can skip this if we're happy
# with the default; we can also change it later, e.g., for different batch sizes)
net.blobs['data'].reshape(1, # batch size
3, # 3-channel (BGR) images
im_size, im_size) # image size is 227x227
image = caffe.io.load_image('./caffe-fast-rcnn/examples/images/cat.jpg')
transformed_image = transformer.preprocess('data', image)
net.blobs['data'].data[...] = transformed_image # copy the image data into the memory allocated for the net
### perform classification
output = net.forward()
output_prob = output['prob'][0] # the output probability vector for the first image in the batch
print 'predicted class is:', output_prob.argmax()
After the models are loaded into memory, an extra 600 MB of memory are allocated during the forward pass for VGG_FACES and 300 MB for ResNet-50. This brings the total RAM used by the Jetson at 2.2 GB and 1.7 GB, respectively (note that I am not running any other scripts).I made a table with the comparison for the two models:
Model Name Disk Space RAM model load (line 15) RAM at forward pass (line 40)
VGG FACES 580 MB 1 GB 1.6 GB
ResNet-50 102 MB 600 MB 900 MB
Model Name Total RAM RAM after program ends Ram after cleaning cache
VGG FACES 2.2 GB 1.1 GB ~600 MB
ResNet-50 1.7 GB 1.1 GB ~600MB
This is a limiting factor to my project and, I feel, a waste of RAM resources. I’ve searched online but I could not find a satisfactory answer to this problem.
Why does this increase in memory occur? If I am in GPU mode, shouldn’t the unified memory of the Jetson avoid this memory increase?
Also, when the program finishes execution, I’ve noticed that RAM does not go back to the initial ~500 / 600 MB used by the operating system. Instead, it stays at ~1.1GB. When I manually do
echo 3 | sudo tee /proc/sys/vm/drop_caches3
to release the cache, then RAM goes down again to ~600 MB.
Is this also an expected behavior from the Jetson memory management? Am I missing something obvious here (sorry but I am just starting with CUDA programming)?
Thank you very much in advance
Cheers