Hi! I am trying to run inferences on a semantic segmentation model (DANet - using TASM git repository) that i converted to TRT using TF-TRT.
I need this model to run in real time. The model itself takes about 17ms to run an inference which is fine, but it returns a tensor which every pixel has its class number. If I would like to convert the tensor to an rgb image (with set colours for each class), it takes around 80ms no matter which way I try.
I tried converting to a numpy array, then using np.where() for each class - the conversion to numpy takes 80ms.
Tried using tf.where() on each class, then just using the tensor to save the image without conversion - The first time tf.where() gets called it takes 80ms, the for the other classes it takes ~4ms but for every frame it “resets”
Tried using tf.matmul() then converting to numpy - tf.matmul() takes 80ms.
Overall it seems that either tensorflow builds its structures for every frame, eventhough it’s the exact same size & type - so I tried creating a tf.function that only except tensors from this kind of size & type so it’s already ‘preloaded’ but that didnt change anything.
So it might be a different problem, maybe the system is transferring the inference from the GPU to the CPU once i try using computational functions and this transfer takes 80ms?
Any help would be appreciated! Thanks.
Environment
TensorRT Version: 8.4.1.5
CUDA Version: 11.4.14
CUDNN Version: 8.4.1
Operating System + Version: Jetson Xavier AGX Jetpack v5.0.2 L4T 35.1.0
Python Version (if applicable): 3.8.10
TensorFlow Version (if applicable): 2.9.1+nv22.9
Relevant Files
TASM git: GitHub - JanMarcelKezmann/TensorFlow-Advanced-Segmentation-Models: A Python Library for High-Level Semantic Segmentation Models based on TensorFlow and Keras with pretrained backbones.
TF-TRT git: GitHub - tensorflow/tensorrt: TensorFlow/TensorRT integration