I am trying to perform an inference on my Jetson Nano Developer Kit using a Unet model that has been trained with a separate system. After training, I have converted the keras model to onnx and then generated the trt engine with trtexec (this last step was obviously performed on the Jetson). I am then using the engine inside a docker container I have built based on the L4T ML image (
l4t-ml:r32.6.1-py3) and it looks like it is being loaded properly and able to perform an inference on an image (results are all wrong, but that is another issue). This is the snippet I use to run the inference (taken from here):
with engine.create_execution_context() as context: context.debug_sync = False # Transfer input data to the GPU. cuda.memcpy_htod_async(d_input_1, h_input_1, stream) # Run inference. print('load profiler') context.profiler = trt.Profiler() print('execute') context.execute(batch_size=1, bindings=[int(d_input_1), int(d_output)]) print('Transfer predictions back from the GPU.') # Transfer predictions back from the GPU. cuda.memcpy_dtoh_async(h_output, d_output, stream) # Synchronize the stream stream.synchronize() # Return the host output. print(h_output.shape) out = h_output.reshape((1,-1)) return out
However, any time
context.execute runs it prints out a great deal of lines (detailing the operations it’s performing). This is a problem, because this inference needs to run together with other things. How is it possible to remove the verbosity completely? The documentation has not been helpful, as I cannot seem to find the indicated parameter. Below an example of the print outs:
StatefulPartitionedCall/model/Conv/Conv2D__6: 57.1712ms Reformatting CopyNode for Input Tensor 0 to StatefulPartitionedCall/model/Conv/Conv2D + StatefulPartitionedCall/model/activation/re_lu/Relu: 2.87338ms StatefulPartitionedCall/model/Conv/Conv2D + StatefulPartitionedCall/model/activation/re_lu/Relu: 4.20417ms Reformatting CopyNode for Input Tensor 0 to StatefulPartitionedCall/model/expanded_conv/depthwise/pad/Pad + StatefulPartitionedCall/model/expanded_conv/depthwise/Conv/depthwise + StatefulPartitionedCall/model/activation_1/re_lu_1/Relu: 2.47625ms StatefulPartitionedCall/model/expanded_conv/depthwise/pad/Pad + StatefulPartitionedCall/model/expanded_conv/depthwise/Conv/depthwise + StatefulPartitionedCall/model/activation_1/re_lu_1/Relu: 2.0049ms ...
Any suggestion is greatly appreciated.
TensorRT Version: 126.96.36.199
Python Version (if applicable): 3.6.9