Hi,
I serialized an engine from PyTorch using Pytorch2trt.
The engine output wrong results, i.e. all values are zeros, both for real input (image) and random input.
How can I debug the intermediate values to debug the interface to the engine?
My steps:
- Define cuda device by: device_context = cuda.Device(0).make_context()
- Define buffers (inputs, outputs), binding and cuda stream
- Define execution_context by conexec_context = engine.create_execution_context()
- Using np.copyto() copy an input to input buffer.
- Copy input from host memory to GPU memory, by:cuda.memcpy_htod_async(inp.device, inp.host, stream)
- Run inference by:
exec_context.execute_async(batch_size=1, bindings=bindings, stream_handle=stream.handle) - Copy result from GPU memory to host memory by:
cuda.memcpy_dtoh_async(out.host, out.device, stream) - synchronize stream by:
stream.synchronize()
7.read final output by: print(out.host)
The final results are zeros… How can I check in which step is wrong?
Thanks,
Avi