Error in InferenceOp with TAO UNET model

faridh · April 21, 2024, 4:06pm

Hello,

I am trying to deploy a unet model which I trained with the nvidia tao toolkit and exported as ONNX file. The ONNX model checker returns that the model is valid.

The inference operator however throws the following error after converting the model to trt engine file:

[error] [infer_utils.cpp:41] %s

Model Properties:

format: ONNX v6
producer: tf2onnx 1.9.2
version: 0
imports: ai.onnx v11
graph: tf2onnx
description: test
INPUTS:
  input_1:0
    name: input_1:0
    tensor: float32[1,3,224,224]
OUTPUTS:
  argmax_1
    name: argmax_1
    tensor: int64[1,224,224,1]

Code:

        # source parameters
        source_width = 1920
        source_height = 1080
        bpp = 4  # bytes per pixel
        n_channels = 3  # RGB, 4 bytes/channel, 3 channels
        in_dtype = "rgb888"
        source_block_size = source_width * source_height * n_channels * bpp
        source_num_blocks = 2

        source_pool_kwargs = dict(
            storage_type=MemoryStorageType.DEVICE,
            block_size=source_block_size,
            num_blocks=source_num_blocks,
        )

        # inference parameters
        inference_width = 224
        inference_height = 224
        
        inference_n_channels = 3
        inference_block_size = inference_width * inference_height * inference_n_channels * bpp
        inference_num_blocks = 4

        inference_pool_kwargs = dict(
            storage_type=MemoryStorageType.DEVICE,
            block_size=inference_block_size,
            num_blocks=inference_num_blocks,
        )
            
        
        # Define the replayer and holoviz operators
        replayer = VideoStreamReplayerOp(
            self, 
            name="Replayer", 
            directory=VIDEO_DIR,
            basename=VIDEO_BASENAME,
            frame_rate=0,
            repeat=True,
            realtime=True,
        )

        cuda_stream_pool = CudaStreamPool(
            self,
            name="CudaStream",
            dev_id=0,
            stream_flags=0,
            stream_priority=0,
            reserved_size=1,
            max_size=5,
        )

        format_converter = FormatConverterOp(
            self,
            name="FormatConverter",
            resize_width = inference_width,
            resize_height = inference_height,
            scale_min=0,
            scale_max=1,
            out_dtype="float32",
            in_dtype=in_dtype,
            pool=BlockMemoryPool(self, name="pool", **source_pool_kwargs),
            cuda_stream_pool=cuda_stream_pool
        )

        preprocessor = PreprocessorOp(
            self,
            name="Preprocessor",
            permute_axes=[2, 0, 1],
            reshape=[1, inference_n_channels, inference_width, inference_height],
            ascontiguous=True
        )

        # inference op
        self.model_path_map = {
            "unet_tool_segmentation": os.path.join(MODELS_PATH, "model.fixed.onnx"),
        }
        
        pre_processor_map = {"unet_tool_segmentation": ["input_1:0"]}
        inference_map = {"unet_tool_segmentation": ["argmax_1"]}
        
        inference = InferenceOp(
            self,
            name="Inference",
            model_path_map=self.model_path_map,
            allocator=BlockMemoryPool(self, name="pool", **inference_pool_kwargs),
            backend="trt",
            pre_processor_map=pre_processor_map,
            inference_map=inference_map,
            parallel_inference=True,
            infer_on_cpu=False,
            enable_fp16=True,
            input_on_cuda=True,
            output_on_cuda=True,
            transmit_on_cuda=True,
            is_engine_path=False
        )

Log:

root@cagx:/workspace/storage#  cd /workspace/storage ; /usr/bin/env /bin/python3 /root/.vscode-server/extensions/ms-python.debugpy-2024.4.0-linux-arm64/bundled/libs/debugpy/adapter/../../debugpy/launcher 44447 -- /workspace/storage/repos/nvidia-holoscan/video-streaming-platform/src/nvidia_holoscan/applications/tao_unet/app.py 
[info] [gxf_executor.cpp:210] Creating context
[info] [gxf_executor.cpp:1595] Loading extensions from configs...
[info] [gxf_executor.cpp:1741] Activating Graph...
[info] [resource_manager.cpp:79] ResourceManager cannot find Resource of type: nvidia::gxf::GPUDevice for entity [eid: 00002, name: __entity_2]
[info] [resource_manager.cpp:106] ResourceManager cannot find Resource of type: nvidia::gxf::GPUDevice for component [cid: 00003, name: CudaStream]
[info] [resource.hpp:44] Resource [type: nvidia::gxf::GPUDevice] from component [cid: 3] cannot find its value from ResourceManager
[info] [resource_manager.cpp:79] ResourceManager cannot find Resource of type: nvidia::gxf::GPUDevice for entity [eid: 00004, name: __entity_4]
[info] [resource_manager.cpp:106] ResourceManager cannot find Resource of type: nvidia::gxf::GPUDevice for component [cid: 00005, name: pool]
[info] [resource.hpp:44] Resource [type: nvidia::gxf::GPUDevice] from component [cid: 5] cannot find its value from ResourceManager
[info] [resource_manager.cpp:79] ResourceManager cannot find Resource of type: nvidia::gxf::GPUDevice for entity [eid: 00006, name: __entity_6]
[info] [resource_manager.cpp:106] ResourceManager cannot find Resource of type: nvidia::gxf::GPUDevice for component [cid: 00007, name: pool]
[info] [resource.hpp:44] Resource [type: nvidia::gxf::GPUDevice] from component [cid: 7] cannot find its value from ResourceManager
[info] [resource_manager.cpp:79] ResourceManager cannot find Resource of type: nvidia::gxf::GPUDevice for entity [eid: 00008, name: __entity_8]
[info] [resource_manager.cpp:106] ResourceManager cannot find Resource of type: nvidia::gxf::GPUDevice for component [cid: 00009, name: pool]
[info] [resource.hpp:44] Resource [type: nvidia::gxf::GPUDevice] from component [cid: 9] cannot find its value from ResourceManager
[info] [gxf_executor.cpp:1771] Running Graph...
[info] [gxf_executor.cpp:1773] Waiting for completion...
[info] [gxf_executor.cpp:1774] Graph execution waiting. Fragment: 
[info] [greedy_scheduler.cpp:190] Scheduling 7 entities
[info] [context.cpp:50] _______________
[info] [context.cpp:50] Vulkan Version:
[info] [context.cpp:50]  - available:  1.2.131
[info] [context.cpp:50]  - requesting: 1.2.0
[info] [context.cpp:50] ______________________
[info] [context.cpp:50] Used Instance Layers :
[info] [context.cpp:50] 
[info] [context.cpp:50] Used Instance Extensions :
[info] [context.cpp:50] VK_EXT_debug_utils
[info] [context.cpp:50] VK_KHR_external_memory_capabilities
[info] [context.cpp:50] ____________________
[info] [context.cpp:50] Compatible Devices :
[info] [context.cpp:50] 0: Quadro RTX 6000
[info] [context.cpp:50] Physical devices found : 
[info] [context.cpp:50] 1
[info] [context.cpp:50] ________________________
[info] [context.cpp:50] Used Device Extensions :
[info] [context.cpp:50] VK_KHR_external_memory
[info] [context.cpp:50] VK_KHR_external_memory_fd
[info] [context.cpp:50] VK_KHR_external_semaphore
[info] [context.cpp:50] VK_KHR_external_semaphore_fd
[info] [context.cpp:50] VK_KHR_push_descriptor
[info] [context.cpp:50] VK_EXT_line_rasterization
[info] [context.cpp:50] 
[info] [vulkan_app.cpp:777] Using device 0: Quadro RTX 6000
[info] [infer_utils.cpp:222] Input tensor names empty from Config. Creating from pre_processor map.
[info] [infer_utils.cpp:224] Input Tensor names: [input_1:0]
[info] [infer_utils.cpp:258] Output tensor names empty from Config. Creating from inference map.
[info] [infer_utils.cpp:260] Output Tensor names: [argmax_1]
[info] [inference.cpp:202] Inference Specifications created
[info] [core.cpp:46] TRT Inference: converting ONNX model at /workspace/storage/models/unet/model.fixed.onnx
[info] [utils.cpp:81] Cached engine found: /workspace/storage/models/unet/model.fixed.QuadroRTX6000.7.5.72.trt.8.2.3.0.engine.fp16
[info] [core.cpp:79] Loading Engine: /workspace/storage/models/unet/model.fixed.QuadroRTX6000.7.5.72.trt.8.2.3.0.engine.fp16
[info] [core.cpp:122] Engine loaded: /workspace/storage/models/unet/model.fixed.QuadroRTX6000.7.5.72.trt.8.2.3.0.engine.fp16
[info] [infer_manager.cpp:343] HoloInfer buffer created for argmax_1
[info] [inference.cpp:213] Inference context setup complete
[info] [holoviz.cpp:1425] Input spec:
- type: color
  name: ""
  opacity: 1.000000
  priority: 0

[error] [infer_utils.cpp:41] %s

[error] [gxf_wrapper.cpp:68] Exception occurred for operator: 'Inference' - Error in Inference Operator, Sub-module->Tick, Inference execution, Message->Error in Inference Operator, Sub-module->Tick, Data extraction
[error] [entity_executor.cpp:529] Failed to tick codelet Inference in entity: Inference code: GXF_FAILURE
[warning] [greedy_scheduler.cpp:242] Error while executing entity 62 named 'Inference': GXF_FAILURE
[info] [greedy_scheduler.cpp:398] Scheduler finished.
[error] [program.cpp:556] wait failed. Deactivating...
[error] [runtime.cpp:1408] Graph wait failed with error: GXF_FAILURE
[warning] [gxf_executor.cpp:1775] GXF call GxfGraphWait(context) in line 1775 of file /workspace/holoscan-sdk/src/core/executors/gxf/gxf_executor.cpp failed with 'GXF_FAILURE' (1)
[error] [gxf_executor.cpp:1779] GxfGraphWait Error: GXF_FAILURE

Setting the loglevel to DEBUG/TRACE doesn’t yield more information. Do you have an idea what the issue could be?

jinl · May 22, 2024, 5:11pm

Sorry for the late reply. What is the input / output tensor shapes of your onnx model? It may be a tensor shape mismatch

faridh · May 24, 2024, 5:09pm

The model expects an input tensor with dimension float32[1,3,224,224] and outputs a tensor with dimension int64[1,224,224,1]. My PreprocessorOp operator emits as expected a tensor with dimension (1, 3, 224, 224) and cp.dtype("float32").

Could the output type int64 maybe cause the issue?

jinl · May 24, 2024, 5:27pm

You’re right, that could be the issue, since if we look at https://docs.nvidia.com/holoscan/sdk-user-guide/inference.html the currently supported data types are float32 , int32 , int8.

faridh · August 29, 2024, 11:59am

Changing the type to float32 did not resolve the issue, but in the end I opted for a yolov8 segmentation model.

system · September 12, 2024, 12:00pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
TAO 5.0 Classification (PyTorch) deploy error TAO Toolkit	49	1447	September 11, 2023
Inference with tensorrt engine file has different results compared with trained hdf5 model TAO Toolkit	9	201	July 8, 2024
Some PyTorch model with slicing operation fails on inference TensorRT tensorrt , pytorch , onnx , deepstream	2	1450	January 7, 2022
Falure to do inference TAO Toolkit tensorrt	9	1071	January 11, 2022
Unsupported ONNX data type: UINT8 (2) TensorRT	24	8939	May 6, 2021
Assertion Error in buildMemGraph: 0 (mg.nodes[mg.regionIndices[outputRegion]].size == mg.nodes[mg.regionIndices[inputRegion]].size) TensorRT	10	1293	October 12, 2021
Cannot infer with fpenet with TensorRT8.0 TAO Toolkit	14	1581	March 3, 2022
Inference error while using tensorrt engine on jetson nano Jetson Nano tensorrt , nvbugs	23	3616	April 20, 2022
mobilenet onnx problem TensorRT	10	1877	October 12, 2021
Inference with TensorRT is different that inference with HDF5 TAO Toolkit	16	468	March 25, 2024

Error in InferenceOp with TAO UNET model

Related topics