[ir_graph_builder.cpp:myelinGraphSetInputShapeProfile:254] Called with invalid shape profile, expect min <= common <= max on input 8)

Description

I’m trying to convert flux.1-dev-onnx model.onnx to .engine. especially bf16

I already converted t5, vae, clip but I have no more ideas to deal with shape profile of transformer.

[05/22/2025-07:52:41] [TRT] [W] Detected layernorm nodes in FP16.
[05/22/2025-07:52:41] [TRT] [W] Running layernorm after self-attention with FP16 Reduce or Pow may cause overflow. Forcing Reduce or Pow Layers in FP32 precision, or exporting the model to use INormalizationLayer (available with ONNX opset >= 17) can help preserving accuracy.
[05/22/2025-07:54:05] [TRT] [E] Error Code: 9: Skipping tactic 0x0000000000000000 due to exception [ir_graph_builder.cpp:myelinGraphSetInputShapeProfile:254] Called with invalid shape profile, expect min <= common <= max on input 8).
[05/22/2025-07:55:30] [TRT] [E] Error Code: 9: Skipping tactic 0x0000000000000000 due to exception [ir_graph_builder.cpp:myelinGraphSetInputShapeProfile:254] Called with invalid shape profile, expect min <= common <= max on input 8).
[05/22/2025-07:55:31] [TRT] [E] IBuilder::buildSerializedNetwork: Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[/Cast.../proj_out/Add]}.)
Traceback (most recent call last):
  File "/root/workspace/convert_model.py", line 106, in <module>
    convert_to_engine(f"/root/workspace/model/FLUX.1-dev-onnx/{model_name}/1/model_copy.onnx", f"/root/workspace/model/FLUX.1-dev-onnx/{model_name}/1/model.engine", input_ids)
  File "/root/workspace/convert_model.py", line 96, in convert_to_engine
    raise RuntimeError("Failed to create engine")
RuntimeError: Failed to create engine

Environment

nvcr.io/nvidia/tensorrt:24.03-py3 container bash

TensorRT Version: 8.6.3
GPU Type: H100
Nvidia Driver Version: 550.163.01
CUDA Version: 12.4
CUDNN Version:
Operating System + Version: ubuntu 22.04
Python Version (if applicable): 3.10.12
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 2.7
Baremetal or Container (if container which image + tag):

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

accelerate==1.7.0
blinker==1.4
certifi==2025.4.26
charset-normalizer==3.4.2
coloredlogs==15.0.1
cryptography==3.4.8
dbus-python==1.2.18
diffusers==0.33.1
distro==1.7.0
filelock==3.18.0
flatbuffers==25.2.10
fsspec==2025.5.0
hf-xet==1.1.2
httplib2==0.20.2
huggingface-hub==0.31.4
humanfriendly==10.0
idna==3.10
importlib-metadata==4.6.4
jeepney==0.7.1
Jinja2==3.1.6
keyring==23.5.0
launchpadlib==1.10.16
lazr.restfulclient==0.14.4
lazr.uri==1.0.6
markdown-it-py==3.0.0
MarkupSafe==3.0.2
mdurl==0.1.2
more-itertools==8.10.0
mpmath==1.3.0
netron==8.3.4
networkx==3.4.2
numpy==1.26.4
nvidia-cublas-cu12==12.6.4.1
nvidia-cuda-cupti-cu12==12.6.80
nvidia-cuda-nvrtc-cu12==12.6.77
nvidia-cuda-runtime-cu12==12.6.77
nvidia-cudnn-cu12==9.5.1.17
nvidia-cufft-cu12==11.3.0.4
nvidia-cufile-cu12==1.11.1.6
nvidia-curand-cu12==10.3.7.77
nvidia-cusolver-cu12==11.7.1.2
nvidia-cusparse-cu12==12.5.4.2
nvidia-cusparselt-cu12==0.6.3
nvidia-nccl-cu12==2.26.2
nvidia-nvjitlink-cu12==12.6.85
nvidia-nvtx-cu12==12.6.77
oauthlib==3.2.0
onnx==1.18.0
onnxconverter-common==1.13.0
onnxruntime==1.22.0
onnxsim==0.4.36
packaging==25.0
pillow==11.2.1
polygraphy==0.49.22
protobuf==6.31.0
psutil==7.0.0
Pygments==2.19.1
PyGObject==3.42.1
PyJWT==2.3.0
pyparsing==2.4.7
python-apt==2.4.0+ubuntu3
PyYAML==6.0.2
regex==2024.11.6
requests==2.32.3
rich==14.0.0
safetensors==0.5.3
SecretStorage==3.3.1
sentencepiece==0.2.0
six==1.16.0
ssh-import-id==5.11
sympy==1.14.0
tensorrt==10.11.0.33
tensorrt_cu12==10.11.0.33
tensorrt_cu12_bindings==10.11.0.33
tensorrt_cu12_libs==10.11.0.33
tokenizers==0.21.1
torch==2.7.0
tqdm==4.67.1
transformers==4.52.2
triton==3.3.0
typing_extensions==4.13.2
urllib3==2.4.0
wadllib==1.3.6
zipp==1.0.0

    import os
    import tensorrt as trt
    
    model_dir = os.path.dirname(onnx_file_path)
    os.chdir(model_dir)

    TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
    builder = trt.Builder(TRT_LOGGER)
    network = builder.create_network(1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH))
    config = builder.create_builder_config()
    config.set_flag(trt.BuilderFlag.FP16)  

    parser = trt.OnnxParser(network, TRT_LOGGER)

    model = onnx.load(onnx_file_path)
    print("opset version", model.opset_import)

    print("ONNX 입력 목록:")
    for i, inp in enumerate(model.graph.input):
        shape = [
            dim.dim_value if dim.HasField("dim_value") else "?" 
            for dim in inp.type.tensor_type.shape.dim
        ]
        print(f"[{i}] name: {inp.name}, shape: {shape}")
    print("end of input")
    
    with open(os.path.basename(onnx_file_path), 'rb') as model:
        if not parser.parse(model.read()):
            for i in range(parser.num_errors):
                print(parser.get_error(i))
            raise RuntimeError("ONNX parsing failed.")

    if network.num_layers == 0:
        raise RuntimeError("No layers found in network. ONNX parsing likely failed.")

    # Optimization Profile
    profile = builder.create_optimization_profile()
    if model_name == "transformer":
        profile.set_shape("hidden_states",           min=(1, 1, 64),      opt=(4, 77, 64),     max=(8, 128, 64))
        profile.set_shape("encoder_hidden_states",   min=(1, 512, 4096),  opt=(4, 512, 4096),  max=(8, 512, 4096))
        profile.set_shape("pooled_projections",      min=(1, 768),        opt=(4, 768),        max=(8, 768))
        profile.set_shape("timestep",                min=(1,),            opt=(4,),            max=(8,))
        profile.set_shape("img_ids",                 min=(1, 3),          opt=(4, 3),          max=(8, 3))
        profile.set_shape("txt_ids", min=(512, 3), opt=(512, 3), max=(512, 3))
        profile.set_shape("guidance",                min=(1,),            opt=(4,),            max=(8,))
    
    elif model_name == "vae":
        profile.set_shape(
            "latent",
            min=(1, 16, 32, 32),
            opt=(4, 16, 64, 64),
            max=(8, 16, 128, 128)
        )
    else:
        profile.set_shape("input_ids", min=(1, input_ids), opt=(4, input_ids), max=(8, input_ids))
    
    config.add_optimization_profile(profile)
    # print(config) 
    
    last_layer = network.get_layer(network.num_layers - 1)
    output_tensor = last_layer.get_output(0)
    if output_tensor and not output_tensor.is_network_output:
        network.mark_output(output_tensor)

    engine_bytes = builder.build_serialized_network(network, config)
    if engine_bytes is None:
        raise RuntimeError("Failed to create engine")

    with open(engine_file_path, 'wb') as f:
        f.write(engine_bytes)

    print(f"TensorRT engine 생성 완료: {engine_file_path}")

Please include:

  • Exact steps/commands to build your repro
  • Exact steps/commands to run your repro
  • Full traceback of errors encountered

Hi ,
Can you please try Re-exporting ONNX with opset 13
Use Netron to open model.onnx and look at the node near /Cast.../proj_out/Add.
Look for:

  • Cast operations that convert between unsupported types (e.g., int64 -> float16)
  • Add operations that combine unusual shapes or types
    You can manually replace or fuse some operations using ONNX Graph Surgeon

Please let us know after a retry, if the issue still persist.

Thanks