TensorRT Engine python

abraham.pelz · October 11, 2020, 11:12am

Hi,
I serialized an engine from PyTorch using Pytorch2trt.
The engine output wrong results, i.e. all values are zeros, both for real input (image) and random input.
How can I debug the intermediate values to debug the interface to the engine?
My steps:

Define cuda device by: device_context = cuda.Device(0).make_context()
Define buffers (inputs, outputs), binding and cuda stream
Define execution_context by conexec_context = engine.create_execution_context()
Using np.copyto() copy an input to input buffer.
Copy input from host memory to GPU memory, by:cuda.memcpy_htod_async(inp.device, inp.host, stream)
Run inference by:
exec_context.execute_async(batch_size=1, bindings=bindings, stream_handle=stream.handle)
Copy result from GPU memory to host memory by:
cuda.memcpy_dtoh_async(out.host, out.device, stream)
synchronize stream by:
stream.synchronize()
7.read final output by: print(out.host)
The final results are zeros… How can I check in which step is wrong?

Thanks,
Avi

AastaLLL · October 12, 2020, 4:59am

Hi,

Do you have any intermediate model format like .onnx?
If yes, it’s recommended to check it with trtexec first.

/usr/src/tensorrt/bin/trtexec --onnx=[model] --dumpOutput

If the output looks normal, the issue should come from implementation.

Thanks.

abraham.pelz · October 12, 2020, 5:33am

Hi,
Before I extracted an engine I had a TrTMoudle in Python with correct results. The problem is in the implementation. But how can I test where is the problem. For example how do I know if the problem is before the interface step (for example in sending the data to the gpu) or after the interface step (for example in sending the result in the gpu).

AastaLLL · October 13, 2020, 4:08am

Hi,

You will need to use CUDA for inspecting the value of GPU buffer.
But there are some limitation with python interface.

A possible way it to add some extra memcpy to see if each buffer works as expected.
Thanks.