slow to run tensorflow Resnet - how do I increase RAM available to GPU

As you can see from this asciinema:

https://asciinema.org/a/BDXg9L9cGR3u4yFWifLAt5upu

My Jetson Nano is only giving Tensorflow 459MB or GPU RAM to work with. And this is because I closed down other applications - when I was running chrome, then the GPU RAM available to Tensorflow was less than 200MB, causing the benchmark to fail.

How can I create a fixed RAM allocation to the GPU of, say, 1 or 1.5 GB?

Also any way to increase performance? 1.7 on Resnet seems very slow. Perhaps more ram with a bigger batch size would help? I had to decrease the batch size to 1 from default of 32. (Also please keep in mind that if you replicate this you need to checkout the 1.13 branch of tensorflow benchmark, and use Python 3).

For nano there is 4GB RAM, only less than 2GB could be assigned to the GPU side.

On jetson, tensorRT engine with FP16 mode is preferred to do the inference.
If you stick in with Tensorflow. There is also a parameter can be used to fix the GPU allocation.

config = tf.ConfigProto()
config.gpu_options.allow_growth = True
config.gpu_options.per_process_gpu_memory_fraction = 0.4
session = tf.Session(config=config, ...)

But on that benchmark it’s actually training resnet as far as I know. Will the above still work and, what do those two options allow_growth and gput_memory_fraction actually mean?

I have another question. If I add a swap file, can this also be used by the GPU? I assume it can since the GPU uses main memory? Obviously would be slow but perhaps I could configure the system so that tensorflow or python never swaps?

Hi,

allow_growth means to allocate memory little by litte instead of a big chuck.
gput_memory_fraction indicate fix the memory allocation less than the identified fraction.

It’s known that TensorFlow memory allocation behavior is weird on the Jetson.
That’s because Jetson share the physical memory between CPU and GPU, which cause some unexpected result in the TensorFlow.

In general, reboot will help since it release all the memory and increase the available memory amount of TensorFlow.
Thanks.

Okay thank you. Do you recommend pytorch instead? Perhaps its behaviour is more predictable?

Hi,

It’s recommended to convert your model into TensorRT directly.
We have optimized the performance and memory based on Jetson.

You can find some conversion sample from TensorFlow/Caffe/PyTorch in our document.
https://docs.nvidia.com/deeplearning/sdk/tensorrt-sample-support-guide/index.html

Thanks.