I have the following installed in my Xavier NX. When I try to run my trained SSD model (based on mobileNet SSD) with TRT-optimizaion (FP16)… I found it’s running slow (about 2 frame per second)…
Is there Cuda (v10.2) and TF version (1.15.2+nv20.6) compatible problem so that it’s not using GPU ? Any suggestions ? Thanks
Originally, I flashed using jetson-nx-developer-kit-sd-card-image (it’s with JetPack 4.4)
After installation, the cuda version shows 10.2.89
The TRT version is 7.1.3
I followed Jetson Zoo to install TensorFlow 1.15 for JetPack 4.4 (pip3 list shows “1.15.2+nv20.6”)
(3) I do convert my model into TensorRT (see my partial codes below)… Is my code below the right way or any links for me to check/study? Please note my_frozen_graph is based on SSD mobileNet, but it’s transfer-learned and the input image size is about 4 times than 300x300 image size.
1. Although Max-N only enable two CPU, it do have higher clock rate.
If your task is GPU-intensive, it should be able to give a better performance.
2. Swap memory tends to be slower.
Would you mind to try this comment to control the memory usage of TensorFlow:
3. Sorry that my comment might cause some confusing.
The way you used is called TF-TRT, which embeds TensorRT into the TensorFlow frameworks.
This approach consumes lots of memory since you need to enable TensorRT as well as TensorFlow.
Thanks so much for the guidance. It’s very helpful.
I took a link of the link https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/sampleUffSSD
I see it’s with Uff and C++. The performance reads promising.
Meanwhile, I am looking for triton-inference-server based solution, because we already use Triton inference server on the cloud…
(1) In the cloud, we usually just load in our trained transorflow save_model (pb) into the trtservr via nvidia-docker
An example run as below… My question is that: for Jetson Xavier NX, can I do the similar trtserver thing ?