I have some custom neural network that I can run on its own on a Jetson Nano using TF-TRT (both for TF1.x & TF2.x), which gets me to about 30ms inference time.
However, this involves building the engines and keeping TF load every time, which eats too much RAM to run the rest of the algorithm, and takes multiple minutes to start up.
I converted this NN to TensorRT using the TF2->ONNX->TRT way, and this works wonders for RAM usage, en certainly for startup time (which went down to about 5s), but then my inference time goes up to about 600ms.
Is there a way I can save the whole function from TF-TRT as a single TRT plan that I can use with TensorRT in the same way as when using the onnx2tensorrt scripts?
How it is now :
converting using onnx2trt from https://github.com/onnx/onnx-tensorrt:
onnx2trt my_model.onnx -o my_engine.trt
inference using PyCUDA after deserialization using:
def load_engine(trt_runtime, plan_path):
--with open(plan_path, 'rb') as f:
---- engine_data = f.read()
--engine = trt_runtime.deserialize_cuda_engine(engine_data)
This might be a stupid question, but I can’t seem to find many clear resources on this.
I’ve tried using the “save as stand alone TensorRT plan” part of the guide, but this yields multiple different files, which do not include all operations (through the TF2->ONNX->TRT path there is no problem converting everything, and the model is only using Conv2D, BiasAdd, seperableConv2D, batchnorm, upsample2D, maxpooling, activations and concatenations).
I am looking into the UFF path now, but I often read this is deprecated.
Thanks in advance,