Using tensorflow operations after inference on Jetson Xavier AGX

Hi! I am trying to run inferences on a semantic segmentation model (DANet - using TASM git repository) that i converted to TRT using TF-TRT.

I need this model to run in real time. The model itself takes about 17ms to run an inference which is fine, but it returns a tensor which every pixel has its class number. If I would like to convert the tensor to an rgb image (with set colours for each class), it takes around 80ms no matter which way I try.

I tried converting to a numpy array, then using np.where() for each class - the conversion to numpy takes 80ms.
Tried using tf.where() on each class, then just using the tensor to save the image without conversion - The first time tf.where() gets called it takes 80ms, the for the other classes it takes ~4ms but for every frame it “resets”
Tried using tf.matmul() then converting to numpy - tf.matmul() takes 80ms.

Overall it seems that either tensorflow builds its structures for every frame, eventhough it’s the exact same size & type - so I tried creating a tf.function that only except tensors from this kind of size & type so it’s already ‘preloaded’ but that didnt change anything.
So it might be a different problem, maybe the system is transferring the inference from the GPU to the CPU once i try using computational functions and this transfer takes 80ms?

Any help would be appreciated! Thanks.


TensorRT Version:
CUDA Version: 11.4.14
CUDNN Version: 8.4.1
Operating System + Version: Jetson Xavier AGX Jetpack v5.0.2 L4T 35.1.0
Python Version (if applicable): 3.8.10
TensorFlow Version (if applicable): 2.9.1+nv22.9

Relevant Files

TASM git: GitHub - JanMarcelKezmann/TensorFlow-Advanced-Segmentation-Models: A Python Library for High-Level Semantic Segmentation Models based on TensorFlow and Keras with pretrained backbones.
TF-TRT git: GitHub - tensorflow/tensorrt: TensorFlow/TensorRT integration


We are moving this to Jetson Xavier AGX forum to get better help. Please share with us the minimal issue repro model/script.

Thank you.


First, please noted that you can maximize the device performance with the following command:

$ sudo nvpmodel -m 0
$ sudo jetson_clocks

The latency might indicate that there is some memory transfer.

May I know the output you need in detail?
Do you want to write the output into a file or stream it with RTSP?


HI! Thanks for the reply, this is the full model architecture (not the trt model): (4.0 KB)

This is the trt saved model created using tf-trt: (31.2 MB)


the commands have worked and now the overall time went down to 60ms! (previously overall was ~100ms) So thank you very much for that.

I do want to stream it with RTP using gstreamer. Would like to know if there is anything else that could be done.



It’s recommended to convert your model into pure TensorRT for optimal performance.
To infer with an RTSP input/output, please check our Deepstream SDK for examples:


Thanks for the answer. Initial thought was to convert to pure TensorRT but TF-TRT seemed easier.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.