Unable to run optimised network on Tx2 using tensorflow gpu

I have been trying to run a segmentation network on tx2, which gives memory allocation error primarily, but changing different options in tensorflow just pops different error, somehow related to memory allocation only. I am able to run the network on TX2 using CPU only mode at a rate of 1 frame per 10 seconds, but even after graph transform as well as tensorrt optimization its the same error.

Can you provide any insights?

My Tensorflow version is 1.8 running on Ubuntu 16.04 with Cuda 9.

Hi,

It’s recommended to upgrade the system into our latest JetPack software first.

Suppose you are using TF-TRT, is it correct?
TF-TRT tends to allocate twice memory(one for CPU and the other for GPU) in GPU mode, which usually leads to OOM error.
It’s recommended to use pure TensorRT instead.
https://github.com/NVIDIA-AI-IOT/tf_to_trt_image_classification

Thanks.

Hi AastaLLL,

Thanks for your input. To try your recommendation, I installed Jetpack 4.2 on a TX2, cuda 10 and every thing. Now I am unable to find a Tensorflow-gpu for 4.2.2 . I want to work with Python2 due to ROS compatibility. Could you help me with that?

Also, the OOM error occurs without trt optimization. I froze the graph and tried to run it before optimizing, which yields the same error.

Thanks.

Hi,

We provide some prebuilt TensorFlow package for Jetson users.
Please check this document for information:
https://docs.nvidia.com/deeplearning/frameworks/install-tf-jetson-platform/index.html

If you are facing memory issue, it’s recommended to use pure TensorRT instead.
TF-TRT by default duplicate the pipeline for both TensorFlow and TensorRT, which consumes lots of memory.

Thanks.

Hi,

Thank you for looking into the issue.
I updated my TX2 with jetpack 4.2, Cuda 10 and tensorflow 1.14, converted my script to python3 and build cv_bridge with ros to support python 3 and now the network is running absolutely fine by loading weights from ckpt method. Even without optimizing for inference or graph transforms, its able to load and run, which makes me think that there is some issue in CUDA 9 memory allocation technique as now no OOM error is present.
Also, I was facing some issues in graph transforms or freezing the due to batch norm in the network, which is still there, as after the transform or freezing, when loading graph from .pb file, error occurs that float expected but float_ref passed, which I assume is a tensorflow error. I understand this isn’t the correct place to ask this but if you have any idea over this, please do let me know.
Thanks again.