GPU Memory Error for Inferencing

Inferencing Mask Rcnn Model Based On Rasnet-101
Getting below error.

7] Stats:
Limit: 74072064
InUse: 74072064
MaxInUse: 74072064
NumAllocs: 80
MaxAllocSize: 74017536

OP_REQUIRES failed at assign_op.h:117 : Resource exhausted: OOM when allocating tensor with shape[64] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
2019-04-16 17:29:23.989800: W tensorflow/core/common_runtime/bfc_allocator.cc:267] Allocator (GPU_0_bfc) ran out of memory trying to allocate 256B. Current allocation summary follows.
OP_REQUIRES failed at assign_op.h:117 : Resource exhausted: OOM when allocating tensor with shape[512] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc

OP_REQUIRES failed at assign_op.h:117 : Resource exhausted: OOM when allocating tensor with shape[512] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc

Hi,

You are running out of memory.
Please reduce your batch_size or switch to a smaller model.

Thanks.

Hi,

We are running only 1 image to inference using Mask Rcnn Model.Does Batch Size affect in this scenario?

Are there any model limitations like
1: Should the model be of particular type (Eg Tensorflow Lite)?
2: Should the model be quantised?
3: Should the model be of particular data type?
4: Can we run keras model?

Also are there any demo code to run object detection on Nvidia JETSON NANO?
THis would give clear picture.

Hi,

Would you mind to share your TensorFlow code with us to have a check?
TensorFlow will allocate memory for the batchsize number when creating a session to avoid allocating it again.

The main reason is that Nano only has 4G memory but your model may occupy more than 4G.
So we need to check if there is any option can decrease the memory usage.

If the memory usage is valid, you can use Keras as well.

Here are some object dection model for your reference:
TF: https://github.com/NVIDIA-AI-IOT/tf_trt_models
Caffe: https://github.com/dusty-nv/jetson-inference/blob/master/docs/detectnet-training.md

By the way, it’s highly recommended to update your model to use pure TensorRT since TensorFlow consume lots of memory.
This will require you to transfer the .pb file into uff and some plugin layer implementation.
https://docs.nvidia.com/deeplearning/sdk/tensorrt-sample-support-guide/index.html#uffssd_sample

Thanks.

How does swap involve here? Is it possible to add more swap to solve this?