Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[onnx::MatMul_9665 + (Unnamed Layer* 387)

Description

I’m trying to convert an onnx file to TensorRT engine by trtexec. But it raises the following error:

[02/09/2023-08:26:09] [W] [TRT] Skipping tactic 0x0000000000000000 due to Myelin error: autotuning: CUDA error 2
allocating 6443238909-byte buffer: out of memory
[02/09/2023-08:26:09] [E] Error[10]: [optimizer.cpp::computeCosts::3728] Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[onnx::MatMul_9665 + (Unnamed Layer* 387) [Shuffle].../input_blocks.1/input_blocks.1.1/Reshape_2 + /input_blocks.1/input_blocks.1.1/Transpose_1 + /input_blocks.1/input_blocks.1.1/Reshape_3]}.)
[02/09/2023-08:26:09] [E] Error[2]: [builder.cpp::buildSerializedNetwork::751] Error Code 2: Internal Error (Assertion engine != nullptr failed. )[02/09/2023-08:26:09] [E] Engine could not be created from network
[02/09/2023-08:26:09] [E] Building engine failed
[02/09/2023-08:26:09] [E] Failed to create engine from model or file.
[02/09/2023-08:26:09] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v8501] # trtexec --onnx=unet_ldm.onnx --saveEngine=unet_ldm.trt

Environment

nvidia docker container 22.12

Relevant Files

The model I use is the official stable-diffusion unet model.

Steps To Reproduce

Use this script to convert unet to onnx.

from omegaconf import OmegaConf

config = OmegaConf.load("../../stable-diffusion/configs/stable-diffusion/v1-inference.yaml")
config = config.model.params.unet_config

from ldm.util import instantiate_from_config

unet = instantiate_from_config(config)
import torch

model = unet.eval().cuda()
input_0 = torch.randn(6, 4, 64, 64, dtype=torch.float32).cuda()
input_1 = torch.tensor([1, 3, 7, 8, 9, 23], dtype=torch.int32).cuda()
input_2 = torch.randn(6, 77, 768, dtype=torch.float32).cuda()
with torch.no_grad():
    torch.onnx.export(unet, (input_0, input_1, input_2), 'unet_ldm.onnx')

Then use trtexec --onnx=unet_ldm.onnx --saveEngine=unet_ldm.trt to generate engine.

I get the following info with --verbose:

[02/09/2023-08:45:48] [V] [TRT] *************** Autotuning format combination: Float(1310720,4096,64,1), Float(59
136,768,1) -> Float(1310720,4096,64,1) ***************
[02/09/2023-08:45:48] [V] [TRT] --------------- Timing Runner: {ForeignNode[onnx::MatMul_9665 + (Unnamed Layer* 387) [Shuffle].../input_blocks.1/input_blocks.1.1/Reshape_2 + /input_blocks.1/input_blocks.1.1/Transpose_1 + /input_blocks.1/input_blocks.1.1/Reshape_3]} (Myelin)
[02/09/2023-08:45:48] [W] [TRT] Skipping tactic 0x0000000000000000 due to Myelin error: autotuning: CUDA error 2allocating 6443238909-byte buffer: out of memory
[02/09/2023-08:45:48] [V] [TRT] Fastest Tactic: 0xd15ea5edd15ea5ed Time: inf
[02/09/2023-08:45:48] [V] [TRT] Deleting timing cache: 820 entries, served 8637 hits since creation.
[02/09/2023-08:45:49] [E] Error[10]: [optimizer.cpp::computeCosts::3728] Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[onnx::MatMul_9665 + (Unnamed Layer* 387) [Shuffle].../input_blocks.1/input_blocks.1.1/Reshape_2 + /input_blocks.1/input_blocks.1.1/Transpose_1 + /input_blocks.1/input_blocks.1.1/Reshape_3]}.)

I’m not sure whether this is because of my memory not enough or trt internal bug.

Could you please share with us the ONNX model.
We are unable to run the script successfully due to ldm library issue.

Traceback (most recent call last):
  File "test.py", line 6, in <module>
    from ldm.util import instantiate_from_config
  File "/usr/local/lib/python3.8/dist-packages/ldm.py", line 20
    print self.face_rec_model_path
          ^
SyntaxError: Missing parentheses in call to 'print'. Did you mean print(self.face_rec_model_path)?

The onnx model is too large to share.

Actually, the ldm repo is: https://github.com/CompVis/stable-diffusion

After you install this repo, you can import ldm.

Besides, you need to replace flag with False in this line to ignore checkpointing, otherwise the onnx cannot be exported normally.

Please share via Google drive link.

Here is the download link: https://cloud.tsinghua.edu.cn/f/61a62eb8b80b4252aa80/?dl=1

The network is too slow, sorry for the delay.

Hi,

The files you shared were not helpful.
Could you please share unet_ldm.onnx with us.

The unet_ldm.onnx is in the file. For onnx models larger than 2GB, PyTorch exports it with weight files.

Have you fix the problem?