Could not find any implementation for node {ForeignNode[onnx::MatMul_8444 + (Unnamed Layer* 1931) [Shuffle].../ScatterND_14]}

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU): NVIDIA RTX A5000
• DeepStream Version: 6.3
• TensorRT Version: 8.6.1.6
• CUDA Version: 12.2
• NVIDIA GPU Driver Version (valid for GPU only): 535.129.03
• Issue Type( questions, new requirements, bugs): bugs
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

Hello,

I’m trying to use an onnx file in deepstream but it fails to build trt engine with batch size greater than one with the following error

cuda_codegen.hpp:604: DCHECK(it != shape_name_map_.end()) failed. Shape tensor is not found in shape_name_map: __mye85697-HOST-(i64[1][1]so[0]p[0], mem_prop=100)
cuda_codegen.hpp:604: DCHECK(it != shape_name_map_.end()) failed. Shape tensor is not found in shape_name_map: __mye85697-HOST-(i64[1][1]so[0]p[0], mem_prop=100)
cuda_codegen.hpp:604: DCHECK(it != shape_name_map_.end()) failed. Shape tensor is not found in shape_name_map: __mye85660-HOST-(i64[1][1]so[0]p[0], mem_prop=100)
ERROR: [TRT]: 10: Could not find any implementation for node {ForeignNode[onnx::MatMul_8444 + (Unnamed Layer* 1931) [Shuffle].../ScatterND_14]}.
ERROR: [TRT]: 10: [optimizer.cpp::computeCosts::3869] Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[onnx::MatMul_8444 + (Unnamed Layer* 1931) [Shuffle].../ScatterND_14]}.)
ERROR: ../nvdsinfer/nvdsinfer_model_builder.cpp:1124 Build engine failed from config file
ERROR: ../nvdsinfer/nvdsinfer_model_builder.cpp:816 failed to build trt engine.
0:05:14.693655320   332 0x5624d87c0090 ERROR                nvinfer gstnvinfer.cpp:676:gst_nvinfer_logger:<nvinfer0> NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:2022> [UID = 1]: build engine file failed
0:05:14.856545171   332 0x5624d87c0090 ERROR                nvinfer gstnvinfer.cpp:676:gst_nvinfer_logger:<nvinfer0> NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2108> [UID = 1]: build backend context failed
0:05:14.856564079   332 0x5624d87c0090 ERROR                nvinfer gstnvinfer.cpp:676:gst_nvinfer_logger:<nvinfer0> NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::initialize() <nvdsinfer_context_impl.cpp:1282> [UID = 1]: generate backend failed, check config file settings
0:05:14.856579200   332 0x5624d87c0090 WARN                 nvinfer gstnvinfer.cpp:898:gst_nvinfer_start:<nvinfer0> error: Failed to create NvDsInferContext instance
0:05:14.856582381   332 0x5624d87c0090 WARN                 nvinfer gstnvinfer.cpp:898:gst_nvinfer_start:<nvinfer0> error: Config file path: /build/TestModels/cltr/config_infer_crowd_count_cltr.txt, NvDsInfer Error: NVDSINFER_CONFIG_FAILED

I increased workspace-size to 6144 but it didn’t fix the issue.

I also did two other experiments with different deepstream and TensorRT version

Experiment #1

Using nvcr.io/nvidia/deepstream:6.3-samples docker image

  • DeepStream Version: 6.3

  • TensorRT Version: 8.5.3.1

The engine build failed with the same error when using the default workspace-size but worked when setting workspace-size to 6144.

Experiment #2

Using nvcr.io/nvidia/deepstream:6.4-samples-multiarch docker image

  • DeepStream Version: 6.4

  • TensorRT Version: 8.6.1.6

The engine build failed with the same error when using the default workspace-size and even when setting workspace-size to 6144.

You can find the model here.

Seems it is model related issue. have you tried “trtexec” engine building?

It may look like
/usr/src/tensorrt/bin/trtexec --onnx=model.onnx --minShapes=samples:1x3x768x1024 --optShapes=samples:4x3x768x1024 --maxShapes=samples:4x3x768x1024 --fp16 --saveEngine=./model.onnx_fp16_b4.engine

And please upload your nvinfer configuration file.

Here is the configuration file for nvinfer config.txt (685 Bytes)

It failed to build using trtexec but I was able to build the engine with tensorrt python API.

Here is the python script I used to generate the engine
ONNX_to_tensorRT.txt (6.4 KB)

I did not reproduce the issue in our devices with your configuration file.

with TensorRT version 8.6.1.6?

With the two docker containers you mentioned in Could not find any implementation for node {ForeignNode[onnx::MatMul_8444 + (Unnamed Layer* 1931) [Shuffle].../ScatterND_14]}

I tried generating the engine with different batch sizes in nvcr.io/nvidia/deepstream:6.4-samples-multiarch docker container and it only worked with batch-size=1.

Here is the pipeline I used

gst-launch-1.0 uridecodebin uri=file:///video_file.mp4 ! muxer.sink_0 nvstreammux name=muxer width=1280 height=720 batch-size=1 ! nvinfer config-file-path=config.txt ! nvvideoconvert ! fakesink

It works with batch-size=4. No issue found with nvcr.io/nvidia/deepstream:6.4-samples-multiarch docker container.

I enabled verbose logs in trtexec and got these logs

[03/13/2024-14:40:46] [V] [TRT] =============== Computing costs for {ForeignNode[onnx::MatMul_8444 + (Unnamed Layer* 1931) [Shuffle].../ScatterND_14]}
[03/13/2024-14:40:46] [V] [TRT] *************** Autotuning format combination: Bool(64,8,1), Float(16384,64,8,1), Float(4096,512,64,1), Float(4096,512,64,1), Float(4096,512,64,1), Float(4096,512,64,1), Float((* 768 N),64,1), Float((* 768 N),64,1), Float((* 768 N),64,1), Float((* 768 N),64,1), Float(1400,2,1) -> Float(25200,2100,3,1) where E0=(* 768 N) ***************
[03/13/2024-14:40:46] [V] [TRT] --------------- Timing Runner: {ForeignNode[onnx::MatMul_8444 + (Unnamed Layer* 1931) [Shuffle].../ScatterND_14]} (Myelin[0x80000023])
cuda_codegen.hpp:604: DCHECK(it != shape_name_map_.end()) failed. Shape tensor is not found in shape_name_map: __mye85777-HOST-(i64[1][1]so[0]p[0], mem_prop=100)
[03/13/2024-14:42:08] [V] [TRT] Skipping tactic 0x0000000000000000 due to exception No Myelin Error exists
[03/13/2024-14:42:08] [V] [TRT] {ForeignNode[onnx::MatMul_8444 + (Unnamed Layer* 1931) [Shuffle].../ScatterND_14]} (Myelin[0x80000023]) profiling completed in 82.4017 seconds. Fastest Tactic: 0xd15ea5edd15ea5ed Time: inf
[03/13/2024-14:42:08] [V] [TRT] *************** Autotuning format combination: Bool(64,8,1), Half(16384,64,8,1), Half(4096,512,64,1), Half(4096,512,64,1), Half(4096,512,64,1), Half(4096,512,64,1), Half((* 768 N),64,1), Half((* 768 N),64,1), Half((* 768 N),64,1), Half((* 768 N),64,1), Half(1400,2,1) -> Half(25200,2100,3,1) where E0=(* 768 N) ***************
[03/13/2024-14:42:08] [V] [TRT] --------------- Timing Runner: {ForeignNode[onnx::MatMul_8444 + (Unnamed Layer* 1931) [Shuffle].../ScatterND_14]} (Myelin[0x80000023])
cuda_codegen.hpp:604: DCHECK(it != shape_name_map_.end()) failed. Shape tensor is not found in shape_name_map: __mye85777-HOST-(i64[1][1]so[0]p[0], mem_prop=100)
[03/13/2024-14:43:02] [V] [TRT] Skipping tactic 0x0000000000000000 due to exception No Myelin Error exists
[03/13/2024-14:43:02] [V] [TRT] {ForeignNode[onnx::MatMul_8444 + (Unnamed Layer* 1931) [Shuffle].../ScatterND_14]} (Myelin[0x80000023]) profiling completed in 54.4271 seconds. Fastest Tactic: 0xd15ea5edd15ea5ed Time: inf
[03/13/2024-14:43:02] [V] [TRT] *************** Autotuning format combination: Bool(64,8,1), Half(2048,1:8,256,32), Half(512,1:8,64,1), Half(512,1:8,64,1), Half(512,1:8,64,1), Half(512,1:8,64,1), Float((* 768 N),64,1), Float((* 768 N),64,1), Float((* 768 N),64,1), Float((* 768 N),64,1), Float(1400,2,1) -> Half(4200,1:8,6,2) where E0=(* 768 N) ***************
[03/13/2024-14:43:02] [V] [TRT] --------------- Timing Runner: {ForeignNode[onnx::MatMul_8444 + (Unnamed Layer* 1931) [Shuffle].../ScatterND_14]} (Myelin[0x80000023])
cuda_codegen.hpp:604: DCHECK(it != shape_name_map_.end()) failed. Shape tensor is not found in shape_name_map: __mye85740-HOST-(i64[1][1]so[0]p[0], mem_prop=100)
[03/13/2024-14:43:56] [V] [TRT] Skipping tactic 0x0000000000000000 due to exception No Myelin Error exists
[03/13/2024-14:43:56] [V] [TRT] {ForeignNode[onnx::MatMul_8444 + (Unnamed Layer* 1931) [Shuffle].../ScatterND_14]} (Myelin[0x80000023]) profiling completed in 53.0703 seconds. Fastest Tactic: 0xd15ea5edd15ea5ed Time: inf
[03/13/2024-14:43:56] [V] [TRT] Deleting timing cache: 269 entries, served 197 hits since creation.
[03/13/2024-14:43:56] [E] Error[10]: Could not find any implementation for node {ForeignNode[onnx::MatMul_8444 + (Unnamed Layer* 1931) [Shuffle].../ScatterND_14]}.
[03/13/2024-14:43:56] [E] Error[10]: [optimizer.cpp::computeCosts::3869] Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[onnx::MatMul_8444 + (Unnamed Layer* 1931) [Shuffle].../ScatterND_14]}.)

We did not reproduce the error.