Reduce verbosity during context execution

Description

Hi.

I am trying to perform an inference on my Jetson Nano Developer Kit using a Unet model that has been trained with a separate system. After training, I have converted the keras model to onnx and then generated the trt engine with trtexec (this last step was obviously performed on the Jetson). I am then using the engine inside a docker container I have built based on the L4T ML image (l4t-ml:r32.6.1-py3) and it looks like it is being loaded properly and able to perform an inference on an image (results are all wrong, but that is another issue). This is the snippet I use to run the inference (taken from here):

with engine.create_execution_context() as context:
       context.debug_sync = False
      
       # Transfer input data to the GPU.
       cuda.memcpy_htod_async(d_input_1, h_input_1, stream)

       # Run inference.
       print('load profiler')
       context.profiler = trt.Profiler()
       print('execute')
       context.execute(batch_size=1, bindings=[int(d_input_1), int(d_output)])
       print('Transfer predictions back from the GPU.')
       # Transfer predictions back from the GPU.
       cuda.memcpy_dtoh_async(h_output, d_output, stream)
       # Synchronize the stream
       stream.synchronize()
       # Return the host output.
       print(h_output.shape)
       out = h_output.reshape((1,-1))
       return out 

However, any time context.execute runs it prints out a great deal of lines (detailing the operations it’s performing). This is a problem, because this inference needs to run together with other things. How is it possible to remove the verbosity completely? The documentation has not been helpful, as I cannot seem to find the indicated parameter. Below an example of the print outs:

StatefulPartitionedCall/model/Conv/Conv2D__6: 57.1712ms
Reformatting CopyNode for Input Tensor 0 to StatefulPartitionedCall/model/Conv/Conv2D + StatefulPartitionedCall/model/activation/re_lu/Relu: 2.87338ms
StatefulPartitionedCall/model/Conv/Conv2D + StatefulPartitionedCall/model/activation/re_lu/Relu: 4.20417ms
Reformatting CopyNode for Input Tensor 0 to StatefulPartitionedCall/model/expanded_conv/depthwise/pad/Pad + StatefulPartitionedCall/model/expanded_conv/depthwise/Conv/depthwise + StatefulPartitionedCall/model/activation_1/re_lu_1/Relu: 2.47625ms
StatefulPartitionedCall/model/expanded_conv/depthwise/pad/Pad + StatefulPartitionedCall/model/expanded_conv/depthwise/Conv/depthwise + StatefulPartitionedCall/model/activation_1/re_lu_1/Relu: 2.0049ms
...

Any suggestion is greatly appreciated.

Environment

TensorRT Version: 8.0.1.6
Python Version (if applicable): 3.6.9

Reduce verbosity during context execution

Please refer to the following document to set the logger level:
https://docs.nvidia.com/deeplearning/tensorrt/api/python_api/infer/Core/Logger.html#tensorrt.ILogger.Severity

Also, we recommend that you please use the latest TensorRT version and let us know if you still face this issue.

We are moving this post to the Jetson Nano forum to get further assistance.

Thank you.

Hi,

Just set the verbose level to ERROR or even INTERNAL_ERROR.

https://docs.nvidia.com/deeplearning/tensorrt/api/python_api/infer/Core/Logger.html?highlight=info#tensorrt.ILogger.Severity

For example:

TRT_LOGGER = trt.Logger(trt.Logger.ERROR)
runtime = trt.Runtime(TRT_LOGGER)

Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.