Error in InferenceOp with TAO UNET model

Hello,

I am trying to deploy a unet model which I trained with the nvidia tao toolkit and exported as ONNX file. The ONNX model checker returns that the model is valid.

The inference operator however throws the following error after converting the model to trt engine file:

[error] [infer_utils.cpp:41] %s

Model Properties:

format: ONNX v6
producer: tf2onnx 1.9.2
version: 0
imports: ai.onnx v11
graph: tf2onnx
description: test
INPUTS:
  input_1:0
    name: input_1:0
    tensor: float32[1,3,224,224]
OUTPUTS:
  argmax_1
    name: argmax_1
    tensor: int64[1,224,224,1]

Code:

        # source parameters
        source_width = 1920
        source_height = 1080
        bpp = 4  # bytes per pixel
        n_channels = 3  # RGB, 4 bytes/channel, 3 channels
        in_dtype = "rgb888"
        source_block_size = source_width * source_height * n_channels * bpp
        source_num_blocks = 2

        source_pool_kwargs = dict(
            storage_type=MemoryStorageType.DEVICE,
            block_size=source_block_size,
            num_blocks=source_num_blocks,
        )

        # inference parameters
        inference_width = 224
        inference_height = 224
        
        inference_n_channels = 3
        inference_block_size = inference_width * inference_height * inference_n_channels * bpp
        inference_num_blocks = 4

        inference_pool_kwargs = dict(
            storage_type=MemoryStorageType.DEVICE,
            block_size=inference_block_size,
            num_blocks=inference_num_blocks,
        )
            
        
        # Define the replayer and holoviz operators
        replayer = VideoStreamReplayerOp(
            self, 
            name="Replayer", 
            directory=VIDEO_DIR,
            basename=VIDEO_BASENAME,
            frame_rate=0,
            repeat=True,
            realtime=True,
        )

        cuda_stream_pool = CudaStreamPool(
            self,
            name="CudaStream",
            dev_id=0,
            stream_flags=0,
            stream_priority=0,
            reserved_size=1,
            max_size=5,
        )

        format_converter = FormatConverterOp(
            self,
            name="FormatConverter",
            resize_width = inference_width,
            resize_height = inference_height,
            scale_min=0,
            scale_max=1,
            out_dtype="float32",
            in_dtype=in_dtype,
            pool=BlockMemoryPool(self, name="pool", **source_pool_kwargs),
            cuda_stream_pool=cuda_stream_pool
        )

        preprocessor = PreprocessorOp(
            self,
            name="Preprocessor",
            permute_axes=[2, 0, 1],
            reshape=[1, inference_n_channels, inference_width, inference_height],
            ascontiguous=True
        )

        # inference op
        self.model_path_map = {
            "unet_tool_segmentation": os.path.join(MODELS_PATH, "model.fixed.onnx"),
        }
        
        pre_processor_map = {"unet_tool_segmentation": ["input_1:0"]}
        inference_map = {"unet_tool_segmentation": ["argmax_1"]}
        
        inference = InferenceOp(
            self,
            name="Inference",
            model_path_map=self.model_path_map,
            allocator=BlockMemoryPool(self, name="pool", **inference_pool_kwargs),
            backend="trt",
            pre_processor_map=pre_processor_map,
            inference_map=inference_map,
            parallel_inference=True,
            infer_on_cpu=False,
            enable_fp16=True,
            input_on_cuda=True,
            output_on_cuda=True,
            transmit_on_cuda=True,
            is_engine_path=False
        )

Log:

root@cagx:/workspace/storage#  cd /workspace/storage ; /usr/bin/env /bin/python3 /root/.vscode-server/extensions/ms-python.debugpy-2024.4.0-linux-arm64/bundled/libs/debugpy/adapter/../../debugpy/launcher 44447 -- /workspace/storage/repos/nvidia-holoscan/video-streaming-platform/src/nvidia_holoscan/applications/tao_unet/app.py 
[info] [gxf_executor.cpp:210] Creating context
[info] [gxf_executor.cpp:1595] Loading extensions from configs...
[info] [gxf_executor.cpp:1741] Activating Graph...
[info] [resource_manager.cpp:79] ResourceManager cannot find Resource of type: nvidia::gxf::GPUDevice for entity [eid: 00002, name: __entity_2]
[info] [resource_manager.cpp:106] ResourceManager cannot find Resource of type: nvidia::gxf::GPUDevice for component [cid: 00003, name: CudaStream]
[info] [resource.hpp:44] Resource [type: nvidia::gxf::GPUDevice] from component [cid: 3] cannot find its value from ResourceManager
[info] [resource_manager.cpp:79] ResourceManager cannot find Resource of type: nvidia::gxf::GPUDevice for entity [eid: 00004, name: __entity_4]
[info] [resource_manager.cpp:106] ResourceManager cannot find Resource of type: nvidia::gxf::GPUDevice for component [cid: 00005, name: pool]
[info] [resource.hpp:44] Resource [type: nvidia::gxf::GPUDevice] from component [cid: 5] cannot find its value from ResourceManager
[info] [resource_manager.cpp:79] ResourceManager cannot find Resource of type: nvidia::gxf::GPUDevice for entity [eid: 00006, name: __entity_6]
[info] [resource_manager.cpp:106] ResourceManager cannot find Resource of type: nvidia::gxf::GPUDevice for component [cid: 00007, name: pool]
[info] [resource.hpp:44] Resource [type: nvidia::gxf::GPUDevice] from component [cid: 7] cannot find its value from ResourceManager
[info] [resource_manager.cpp:79] ResourceManager cannot find Resource of type: nvidia::gxf::GPUDevice for entity [eid: 00008, name: __entity_8]
[info] [resource_manager.cpp:106] ResourceManager cannot find Resource of type: nvidia::gxf::GPUDevice for component [cid: 00009, name: pool]
[info] [resource.hpp:44] Resource [type: nvidia::gxf::GPUDevice] from component [cid: 9] cannot find its value from ResourceManager
[info] [gxf_executor.cpp:1771] Running Graph...
[info] [gxf_executor.cpp:1773] Waiting for completion...
[info] [gxf_executor.cpp:1774] Graph execution waiting. Fragment: 
[info] [greedy_scheduler.cpp:190] Scheduling 7 entities
[info] [context.cpp:50] _______________
[info] [context.cpp:50] Vulkan Version:
[info] [context.cpp:50]  - available:  1.2.131
[info] [context.cpp:50]  - requesting: 1.2.0
[info] [context.cpp:50] ______________________
[info] [context.cpp:50] Used Instance Layers :
[info] [context.cpp:50] 
[info] [context.cpp:50] Used Instance Extensions :
[info] [context.cpp:50] VK_EXT_debug_utils
[info] [context.cpp:50] VK_KHR_external_memory_capabilities
[info] [context.cpp:50] ____________________
[info] [context.cpp:50] Compatible Devices :
[info] [context.cpp:50] 0: Quadro RTX 6000
[info] [context.cpp:50] Physical devices found : 
[info] [context.cpp:50] 1
[info] [context.cpp:50] ________________________
[info] [context.cpp:50] Used Device Extensions :
[info] [context.cpp:50] VK_KHR_external_memory
[info] [context.cpp:50] VK_KHR_external_memory_fd
[info] [context.cpp:50] VK_KHR_external_semaphore
[info] [context.cpp:50] VK_KHR_external_semaphore_fd
[info] [context.cpp:50] VK_KHR_push_descriptor
[info] [context.cpp:50] VK_EXT_line_rasterization
[info] [context.cpp:50] 
[info] [vulkan_app.cpp:777] Using device 0: Quadro RTX 6000
[info] [infer_utils.cpp:222] Input tensor names empty from Config. Creating from pre_processor map.
[info] [infer_utils.cpp:224] Input Tensor names: [input_1:0]
[info] [infer_utils.cpp:258] Output tensor names empty from Config. Creating from inference map.
[info] [infer_utils.cpp:260] Output Tensor names: [argmax_1]
[info] [inference.cpp:202] Inference Specifications created
[info] [core.cpp:46] TRT Inference: converting ONNX model at /workspace/storage/models/unet/model.fixed.onnx
[info] [utils.cpp:81] Cached engine found: /workspace/storage/models/unet/model.fixed.QuadroRTX6000.7.5.72.trt.8.2.3.0.engine.fp16
[info] [core.cpp:79] Loading Engine: /workspace/storage/models/unet/model.fixed.QuadroRTX6000.7.5.72.trt.8.2.3.0.engine.fp16
[info] [core.cpp:122] Engine loaded: /workspace/storage/models/unet/model.fixed.QuadroRTX6000.7.5.72.trt.8.2.3.0.engine.fp16
[info] [infer_manager.cpp:343] HoloInfer buffer created for argmax_1
[info] [inference.cpp:213] Inference context setup complete
[info] [holoviz.cpp:1425] Input spec:
- type: color
  name: ""
  opacity: 1.000000
  priority: 0

[error] [infer_utils.cpp:41] %s

[error] [gxf_wrapper.cpp:68] Exception occurred for operator: 'Inference' - Error in Inference Operator, Sub-module->Tick, Inference execution, Message->Error in Inference Operator, Sub-module->Tick, Data extraction
[error] [entity_executor.cpp:529] Failed to tick codelet Inference in entity: Inference code: GXF_FAILURE
[warning] [greedy_scheduler.cpp:242] Error while executing entity 62 named 'Inference': GXF_FAILURE
[info] [greedy_scheduler.cpp:398] Scheduler finished.
[error] [program.cpp:556] wait failed. Deactivating...
[error] [runtime.cpp:1408] Graph wait failed with error: GXF_FAILURE
[warning] [gxf_executor.cpp:1775] GXF call GxfGraphWait(context) in line 1775 of file /workspace/holoscan-sdk/src/core/executors/gxf/gxf_executor.cpp failed with 'GXF_FAILURE' (1)
[error] [gxf_executor.cpp:1779] GxfGraphWait Error: GXF_FAILURE

Setting the loglevel to DEBUG/TRACE doesn’t yield more information. Do you have an idea what the issue could be?

Sorry for the late reply. What is the input / output tensor shapes of your onnx model? It may be a tensor shape mismatch