Jetson Xavier NX running with Cuda and TF1.5

I have the following installed in my Xavier NX. When I try to run my trained SSD model (based on mobileNet SSD) with TRT-optimizaion (FP16)… I found it’s running slow (about 2 frame per second)…
Is there Cuda (v10.2) and TF version (1.15.2+nv20.6) compatible problem so that it’s not using GPU ? Any suggestions ? Thanks

  • Originally, I flashed using jetson-nx-developer-kit-sd-card-image (it’s with JetPack 4.4)

  • After installation, the cuda version shows 10.2.89
    (cat /usr/local/cuda/version.txt)

  • The TRT version is 7.1.3
    [ls /usr/lib/aarch64-linux-gnu/libnvinfer.so*]

  • I followed Jetson Zoo to install TensorFlow 1.15 for JetPack 4.4 (pip3 list shows “1.15.2+nv20.6”)

Hi,

1. Please remember to maximize the device performance first:

$ sudo nvpmodel -m 0
$ jetson_clocks

2. Please check if TensoFlow use any swap memory when inferencing via tegrastats.

3. It’s recommended to convert your model into TensorRT for better performance.
In general, we can get 887.6 fps of ssd-mobilenet-v1 on XavierNX:

Thanks.

@AastaLLL
Thanks so much for the reply and I tested out.
It seems not make much difference, but I have four follow up questions… Please advise. Thanks again!

(1) I did apply below… but it seems running two cores… is that right path to go ?.. I thought we should try to run more cores

sudo nvpmodel -m 0
sudo jetson_clocks

(2) tegrastats shows it’s using swap… and GPU (99%) as below

RAM 7082/7772MB (lfb 69x4MB) SWAP 2017/3886MB (cached 29MB) CPU [96%@1907,95%@1907,off,off,off,off] EMC_FREQ 0% GR3D_FREQ 99% AO@50C GPU@53.5C PMIC@100C AUX@50.5C CPU@52.5C thermal@52.15C VDD_IN 12045/8815 VDD_CPU_GPU_CV 8357/5514 VDD_SOC 1549/1411
RAM 7082/7772MB (lfb 69x4MB) SWAP 2017/3886MB (cached 29MB) CPU [70%@1907,75%@1907,off,off,off,off] EMC_FREQ 0% GR3D_FREQ 99% AO@50C GPU@53.5C PMIC@100C AUX@50.5C CPU@52.5C thermal@51.8C VDD_IN 11922/8844 VDD_CPU_GPU_CV 8398/5541 VDD_SOC 1508/1412

(3) I do convert my model into TensorRT (see my partial codes below)… Is my code below the right way or any links for me to check/study? Please note my_frozen_graph is based on SSD mobileNet, but it’s transfer-learned and the input image size is about 4 times than 300x300 image size.

import tensorflow.contrib.tensorrt as trt
.....
trt_graph = trt.create_inference_graph(
    input_graph_def=my_frozen_graph,
    outputs=output_names,
    max_batch_size=1,
    max_workspace_size_bytes=1 << 25,
    precision_mode='FP16',
    minimum_segment_size=50
)

....
tf.import_graph_def(trt_graph, name='')
....
tf_sess.run()...
...

(4) I am using Cuda (v10.2) and TF version (1.15.2+nv20.6) …
Is Cuda v10.2 supporting Tensorflow 1.x (1.15.2 for my case) ?

Hi,

1. Although Max-N only enable two CPU, it do have higher clock rate.
If your task is GPU-intensive, it should be able to give a better performance.

2. Swap memory tends to be slower.
Would you mind to try this comment to control the memory usage of TensorFlow:

3. Sorry that my comment might cause some confusing.
The way you used is called TF-TRT, which embeds TensorRT into the TensorFlow frameworks.
This approach consumes lots of memory since you need to enable TensorRT as well as TensorFlow.

It’s more recommended to use our pure TensorRT API. You can find some example here:

4. YES.

Thanks.

@AastaLLL

Thanks so much for the guidance. It’s very helpful.
I took a link of the link https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/sampleUffSSD
I see it’s with Uff and C++. The performance reads promising.
Meanwhile, I am looking for triton-inference-server based solution, because we already use Triton inference server on the cloud…
Questions:
(1) In the cloud, we usually just load in our trained transorflow save_model (pb) into the trtservr via nvidia-docker
An example run as below… My question is that: for Jetson Xavier NX, can I do the similar trtserver thing ?

udo nvidia-docker run --rm --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -p8000:8000 -p8001:8001 -p8002:8002 -v/models/my_models/:/models nvcr.io/nvidia/tritonserver:20.03-py3 trtserver --model-repository=/models

(2) I found Nvidia has experimental support for triton-inference-server for Jetson


Section “Jetson Jetpack Support”…
Is it the proper entry page to start with, if I want to apply trtserver approach ?

Thanks.

Hi,

Sorry for the late update.

YES. If you are familiar with Triton server, it’s recommended to give it a try.
More, you can also use it with Deepstream SDK:

Thanks.