Description
I’m exporting a pre-trained PyTorch model using torch.onnx.export().
The model passes onnx.checker.check_model(), and has the correct output using onnxruntime.
The ONNX model is parsed into a TensorRT model, serialized, loaded, and a context created and executed all successfully with no errors logged. However, the output vector is always all “nan”. This is not the case in PyTorch or using an onnxruntime session with the same model.
Environment
TensorRT Version: 7.2.3.4
GPU Type: GeForce GTX 1650
Nvidia Driver Version: 460.32.03
CUDA Version: cuda-11.1
CUDNN Version: cudnn-8.1.0.77
Operating System + Version: Ubuntu 18.04
Python Version (if applicable): 3.6.9
PyTorch Version (if applicable): 1.8.1+cu102
Relevant Files / Steps To Reproduce
Relevant TensorRT code:
class TRTInference:
ENGINE_PATH = "trt_model.engine"
def __init__(self):
self.engine = None
self.logger = trt.Logger()
def __call__(self, image):
assert self.engine is not None, "Inference before engine created or loaded."
input_shape = self.engine.get_binding_shape("input")
assert tuple(image.shape) == input_shape, "Incorrect image shape passed."
assert image.dtype == np.float32, "Incorrect image dtype passed."
input_size = trt.volume(input_shape) * self.engine.max_batch_size * np.dtype(np.float32).itemsize
device_input = cuda.mem_alloc(input_size)
host_input = cuda.pagelocked_empty(trt.volume(input_shape) * self.engine.max_batch_size, dtype=np.float32)
host_input[:] = image.reshape(-1)
output_shape = self.engine.get_binding_shape("output")
host_output = cuda.pagelocked_empty(trt.volume(output_shape) * self.engine.max_batch_size, dtype=np.float32)
device_output = cuda.mem_alloc(host_output.nbytes)
stream = cuda.Stream()
# Transfer from cpu (host) to gpu (device) using stream
cuda.memcpy_htod_async(device_input, host_input, stream)
context = self.engine.create_execution_context()
context.execute_async_v2(bindings=[int(device_input), int(device_output)], stream_handle=stream.handle)
cuda.memcpy_dtoh_async(host_output, device_output, stream)
stream.synchronize()
return host_output
def create_from_onnx(self, onnx_path):
builder = trt.Builder(self.logger)
network = builder.create_network(1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH))
print("Reading model.")
with trt.OnnxParser(network, self.logger) as parser:
if not parser.parse_from_file(onnx_path):
for error in range(parser.num_errors):
print(parser.get_error(error))
print("Done reading model.")
builder.max_batch_size = 1
config = builder.create_builder_config()
config.max_workspace_size = 1 << 30
print("Building TensorRT engine.")
engine = builder.build_engine(network, config)
assert engine is not None, "Failed to build TensorRT engine."
self.engine = engine
print("Done building TensorRT engine.")
def load(self):
with open(self.ENGINE_PATH, "rb") as f, trt.Runtime(self.logger) as runtime:
engine = runtime.deserialize_cuda_engine(f.read())
assert engine is not None, "Failed to load TensorRT engine."
self.engine = engine
print("Loaded TensorRT engine.")
def save(self):
assert self.engine is not None, "Saving before created."
se = self.engine.serialize()
with open(self.ENGINE_PATH, "wb") as f:
f.write(se)
print("Serialized TensorRT engine.")
Relevant ONNX creation code:
def pytorch_to_onnx(pytorch_model, model_path):
# Export PyTorch model
pytorch_model.eval()
pytorch_model.cuda()
x = torch.randn(1, 3, 224, 224, requires_grad=True, device="cuda")
torch.onnx.export(
pytorch_model, x, model_path, export_params=True, opset_version=10, input_names=["input"],
output_names=["output"]
)
# Simplify ONNX graph
model, status = onnxsim.simplify(onnx.load(model_path))
onnx.checker.check_model(model)
with open(model_path, "wb") as f:
onnx.save(model, f)
I’m guessing I might be doing something wrong in the __call__()
function, as no error is thrown at any other step. I believe I’m following the documentation that’s listed here for the Python API nearly exactly.