Simple ResNet model from PyTorch - "nan" Output


I’m exporting a pre-trained PyTorch model using torch.onnx.export().

The model passes onnx.checker.check_model(), and has the correct output using onnxruntime.

The ONNX model is parsed into a TensorRT model, serialized, loaded, and a context created and executed all successfully with no errors logged. However, the output vector is always all “nan”. This is not the case in PyTorch or using an onnxruntime session with the same model.


TensorRT Version:
GPU Type: GeForce GTX 1650
Nvidia Driver Version: 460.32.03
CUDA Version: cuda-11.1
CUDNN Version: cudnn-
Operating System + Version: Ubuntu 18.04
Python Version (if applicable): 3.6.9
PyTorch Version (if applicable): 1.8.1+cu102

Relevant Files / Steps To Reproduce

Relevant TensorRT code:

class TRTInference:
    ENGINE_PATH = "trt_model.engine"

    def __init__(self):
        self.engine = None
        self.logger = trt.Logger()

    def __call__(self, image):
        assert self.engine is not None, "Inference before engine created or loaded."

        input_shape = self.engine.get_binding_shape("input")
        assert tuple(image.shape) == input_shape, "Incorrect image shape passed."
        assert image.dtype == np.float32, "Incorrect image dtype passed."

        input_size = trt.volume(input_shape) * self.engine.max_batch_size * np.dtype(np.float32).itemsize
        device_input = cuda.mem_alloc(input_size)
        host_input = cuda.pagelocked_empty(trt.volume(input_shape) * self.engine.max_batch_size, dtype=np.float32)
        host_input[:] = image.reshape(-1)

        output_shape = self.engine.get_binding_shape("output")
        host_output = cuda.pagelocked_empty(trt.volume(output_shape) * self.engine.max_batch_size, dtype=np.float32)
        device_output = cuda.mem_alloc(host_output.nbytes)

        stream = cuda.Stream()
        # Transfer from cpu (host) to gpu (device) using stream
        cuda.memcpy_htod_async(device_input, host_input, stream)
        context = self.engine.create_execution_context()
        context.execute_async_v2(bindings=[int(device_input), int(device_output)], stream_handle=stream.handle)
        cuda.memcpy_dtoh_async(host_output, device_output, stream)

        return host_output

    def create_from_onnx(self, onnx_path):
        builder = trt.Builder(self.logger)
        network = builder.create_network(1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH))

        print("Reading model.")
        with trt.OnnxParser(network, self.logger) as parser:
            if not parser.parse_from_file(onnx_path):
                for error in range(parser.num_errors):
        print("Done reading model.")

        builder.max_batch_size = 1
        config = builder.create_builder_config()
        config.max_workspace_size = 1 << 30

        print("Building TensorRT engine.")
        engine = builder.build_engine(network, config)
        assert engine is not None, "Failed to build TensorRT engine."
        self.engine = engine
        print("Done building TensorRT engine.")

    def load(self):
        with open(self.ENGINE_PATH, "rb") as f, trt.Runtime(self.logger) as runtime:
            engine = runtime.deserialize_cuda_engine(
        assert engine is not None, "Failed to load TensorRT engine."
        self.engine = engine
        print("Loaded TensorRT engine.")

    def save(self):
        assert self.engine is not None, "Saving before created."
        se = self.engine.serialize()
        with open(self.ENGINE_PATH, "wb") as f:
        print("Serialized TensorRT engine.")

Relevant ONNX creation code:

def pytorch_to_onnx(pytorch_model, model_path):
    # Export PyTorch model
    x = torch.randn(1, 3, 224, 224, requires_grad=True, device="cuda")
        pytorch_model, x, model_path, export_params=True, opset_version=10, input_names=["input"],

    # Simplify ONNX graph
    model, status = onnxsim.simplify(onnx.load(model_path))
    with open(model_path, "wb") as f:, f)

I’m guessing I might be doing something wrong in the __call__() function, as no error is thrown at any other step. I believe I’m following the documentation that’s listed here for the Python API nearly exactly.

Hi @gerardmaggiolino,

We request you to share issue repro ONNX model and complete scripts to try from our end. Please let us know the steps to run.
Meanwhile we recommend you to alternatively try generating the engine with trtexec command and verify inference output.

For your reference,

Thank you.