TensorRT Engine python

I serialized an engine from PyTorch using Pytorch2trt.
The engine output wrong results, i.e. all values are zeros, both for real input (image) and random input.
How can I debug the intermediate values to debug the interface to the engine?
My steps:

  1. Define cuda device by: device_context = cuda.Device(0).make_context()
  2. Define buffers (inputs, outputs), binding and cuda stream
  3. Define execution_context by conexec_context = engine.create_execution_context()
  4. Using np.copyto() copy an input to input buffer.
  5. Copy input from host memory to GPU memory, by:cuda.memcpy_htod_async(inp.device, inp.host, stream)
  6. Run inference by:
    exec_context.execute_async(batch_size=1, bindings=bindings, stream_handle=stream.handle)
  7. Copy result from GPU memory to host memory by:
    cuda.memcpy_dtoh_async(out.host, out.device, stream)
  8. synchronize stream by:
    7.read final output by: print(out.host)
    The final results are zeros… How can I check in which step is wrong?



Do you have any intermediate model format like .onnx?
If yes, it’s recommended to check it with trtexec first.

/usr/src/tensorrt/bin/trtexec --onnx=[model] --dumpOutput

If the output looks normal, the issue should come from implementation.


Before I extracted an engine I had a TrTMoudle in Python with correct results. The problem is in the implementation. But how can I test where is the problem. For example how do I know if the problem is before the interface step (for example in sending the data to the gpu) or after the interface step (for example in sending the result in the gpu).


You will need to use CUDA for inspecting the value of GPU buffer.
But there are some limitation with python interface.

A possible way it to add some extra memcpy to see if each buffer works as expected.