Simple ResNet model from PyTorch - "nan" Output

gerardmaggiolino · April 9, 2021, 9:00am

Description

I’m exporting a pre-trained PyTorch model using torch.onnx.export().

The model passes onnx.checker.check_model(), and has the correct output using onnxruntime.

The ONNX model is parsed into a TensorRT model, serialized, loaded, and a context created and executed all successfully with no errors logged. However, the output vector is always all “nan”. This is not the case in PyTorch or using an onnxruntime session with the same model.

Environment

TensorRT Version: 7.2.3.4
GPU Type: GeForce GTX 1650
Nvidia Driver Version: 460.32.03
CUDA Version: cuda-11.1
CUDNN Version: cudnn-8.1.0.77
Operating System + Version: Ubuntu 18.04
Python Version (if applicable): 3.6.9
PyTorch Version (if applicable): 1.8.1+cu102

Relevant Files / Steps To Reproduce

Relevant TensorRT code:

class TRTInference:
    ENGINE_PATH = "trt_model.engine"

    def __init__(self):
        self.engine = None
        self.logger = trt.Logger()

    def __call__(self, image):
        assert self.engine is not None, "Inference before engine created or loaded."

        input_shape = self.engine.get_binding_shape("input")
        assert tuple(image.shape) == input_shape, "Incorrect image shape passed."
        assert image.dtype == np.float32, "Incorrect image dtype passed."

        input_size = trt.volume(input_shape) * self.engine.max_batch_size * np.dtype(np.float32).itemsize
        device_input = cuda.mem_alloc(input_size)
        host_input = cuda.pagelocked_empty(trt.volume(input_shape) * self.engine.max_batch_size, dtype=np.float32)
        host_input[:] = image.reshape(-1)

        output_shape = self.engine.get_binding_shape("output")
        host_output = cuda.pagelocked_empty(trt.volume(output_shape) * self.engine.max_batch_size, dtype=np.float32)
        device_output = cuda.mem_alloc(host_output.nbytes)

        stream = cuda.Stream()
        # Transfer from cpu (host) to gpu (device) using stream
        cuda.memcpy_htod_async(device_input, host_input, stream)
        context = self.engine.create_execution_context()
        context.execute_async_v2(bindings=[int(device_input), int(device_output)], stream_handle=stream.handle)
        cuda.memcpy_dtoh_async(host_output, device_output, stream)
        stream.synchronize()

        return host_output

    def create_from_onnx(self, onnx_path):
        builder = trt.Builder(self.logger)
        network = builder.create_network(1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH))

        print("Reading model.")
        with trt.OnnxParser(network, self.logger) as parser:
            if not parser.parse_from_file(onnx_path):
                for error in range(parser.num_errors):
                    print(parser.get_error(error))
        print("Done reading model.")

        builder.max_batch_size = 1
        config = builder.create_builder_config()
        config.max_workspace_size = 1 << 30

        print("Building TensorRT engine.")
        engine = builder.build_engine(network, config)
        assert engine is not None, "Failed to build TensorRT engine."
        self.engine = engine
        print("Done building TensorRT engine.")

    def load(self):
        with open(self.ENGINE_PATH, "rb") as f, trt.Runtime(self.logger) as runtime:
            engine = runtime.deserialize_cuda_engine(f.read())
        assert engine is not None, "Failed to load TensorRT engine."
        self.engine = engine
        print("Loaded TensorRT engine.")

    def save(self):
        assert self.engine is not None, "Saving before created."
        se = self.engine.serialize()
        with open(self.ENGINE_PATH, "wb") as f:
            f.write(se)
        print("Serialized TensorRT engine.")

Relevant ONNX creation code:

def pytorch_to_onnx(pytorch_model, model_path):
    # Export PyTorch model
    pytorch_model.eval()
    pytorch_model.cuda()
    x = torch.randn(1, 3, 224, 224, requires_grad=True, device="cuda")
    torch.onnx.export(
        pytorch_model, x, model_path, export_params=True, opset_version=10, input_names=["input"],
        output_names=["output"]
    )

    # Simplify ONNX graph
    model, status = onnxsim.simplify(onnx.load(model_path))
    onnx.checker.check_model(model)
    with open(model_path, "wb") as f:
        onnx.save(model, f)

I’m guessing I might be doing something wrong in the __call__() function, as no error is thrown at any other step. I believe I’m following the documentation that’s listed here for the Python API nearly exactly.

spolisetty · April 9, 2021, 6:17pm

Hi @gerardmaggiolino,

We request you to share issue repro ONNX model and complete scripts to try from our end. Please let us know the steps to run.
Meanwhile we recommend you to alternatively try generating the engine with trtexec command and verify inference output.

For your reference,
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec

Thank you.

Topic		Replies	Views
PyTorch FCN-ResNet50 --> ONNX --> TensorRT TensorRT	3	980	February 17, 2022
tensorrt's onnx parser can't parse the output layer correctly TensorRT	12	4850	November 24, 2021
Error Code 1: Cudnn (CUDNN_STATUS_EXECUTION_FAILED) TensorRT cuda	3	2181	May 31, 2022
Converting FCN8-ResNet18 from Pytorch to TensorRT for inference on Jetson Nano TensorRT tensorrt , jetson-inference , pytorch , python , onnx	3	2255	October 12, 2021
ONNX Model and Tensorrt Engine gives different output TensorRT tensorrt , onnx	13	5381	June 29, 2022
TensorRT engine gives garbage output TensorRT	1	974	February 10, 2020
GPU memory leak when using tensorrt with onnx model TensorRT tensorrt	4	2012	January 13, 2021
ONNX Model and Tensorrt Engine gives different output for parseq model TensorRT onnx	4	1189	July 17, 2023
Pytorch_onnx_trt strange error TensorRT	5	746	May 14, 2020
PyTorch to Onnx export fails when importing tensorRT TensorRT tensorrt , pytorch , python	5	1641	December 20, 2023

Simple ResNet model from PyTorch - "nan" Output

Description

Environment

Relevant Files / Steps To Reproduce

Related topics