I am running the official tensorflow version on the jetson Nano for an inference workload. My program works on other platforms but the Jetson version of tensorflow uses all 4GBs of RAM when loading running inference with batch_size 1 on a single ssd_inception_v2 net on the GPU.
The standard solution would be to set gpu_options.per_process_gpu_memory_fraction to some low percentage but this value gets ignored and all memory is allocated at inference time.
Does anyone experience similar problems? Is this specific to the nvidia tensorflow version?
The only solution I have so far come up with is to move over to using pure tensorrt, but the conversion of models is very cumbersome as most conversion tools are written for tensorflow-tensorrt.
This is a general issue of tensorflow. Not only for our official version.
Could you try if it works by also setting allow_growth=True?
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
config.gpu_options.per_process_gpu_memory_fraction = 0.4
session = tf.Session(config=config, ...)
thanks for your answer.
Setting allow_growth = True does not help. Neither does setting both options. In both cases I am not able to run even a single detection before the RAM is full with a batch size of 1.
Do you have any other ideas?
Just to give some context. I was able to convert some of the tensorflow models to uff files and build trt engines that consume less than 500 MB each on inference. The same models when run via just regular tensorflow for inference consume more than 3 GB and actually creash the machine or the process due to lack of remaining RAM.
I ran into this trying to use the ZED/Tensorflow demo. Solution was to create a swapfile.
It looks like the physical memory cannot reach the minimal RAM requirement.
So you will need to add some swapfile to support.
Thank you, I will try that.
Can you explain why there is a large difference in RAM usage between TensorRT and tftrt?
The same network in tensorRT uses ~500MB while it needs 1.6GB in tftrt.
It is known that TensorFlow will duplicate two implementation when using TensorRT.
Once is TF-based version and the other is TRT.
This may be related to some pipeline mechanism and fallback control.
So in general, it takes twice above memory.