Description
I’m trying to convert flux.1-dev-onnx model.onnx to .engine. especially bf16
I already converted t5, vae, clip but I have no more ideas to deal with shape profile of transformer.
[05/22/2025-07:52:41] [TRT] [W] Detected layernorm nodes in FP16.
[05/22/2025-07:52:41] [TRT] [W] Running layernorm after self-attention with FP16 Reduce or Pow may cause overflow. Forcing Reduce or Pow Layers in FP32 precision, or exporting the model to use INormalizationLayer (available with ONNX opset >= 17) can help preserving accuracy.
[05/22/2025-07:54:05] [TRT] [E] Error Code: 9: Skipping tactic 0x0000000000000000 due to exception [ir_graph_builder.cpp:myelinGraphSetInputShapeProfile:254] Called with invalid shape profile, expect min <= common <= max on input 8).
[05/22/2025-07:55:30] [TRT] [E] Error Code: 9: Skipping tactic 0x0000000000000000 due to exception [ir_graph_builder.cpp:myelinGraphSetInputShapeProfile:254] Called with invalid shape profile, expect min <= common <= max on input 8).
[05/22/2025-07:55:31] [TRT] [E] IBuilder::buildSerializedNetwork: Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[/Cast.../proj_out/Add]}.)
Traceback (most recent call last):
File "/root/workspace/convert_model.py", line 106, in <module>
convert_to_engine(f"/root/workspace/model/FLUX.1-dev-onnx/{model_name}/1/model_copy.onnx", f"/root/workspace/model/FLUX.1-dev-onnx/{model_name}/1/model.engine", input_ids)
File "/root/workspace/convert_model.py", line 96, in convert_to_engine
raise RuntimeError("Failed to create engine")
RuntimeError: Failed to create engine
Environment
nvcr.io/nvidia/tensorrt:24.03-py3 container bash
TensorRT Version: 8.6.3
GPU Type: H100
Nvidia Driver Version: 550.163.01
CUDA Version: 12.4
CUDNN Version:
Operating System + Version: ubuntu 22.04
Python Version (if applicable): 3.10.12
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 2.7
Baremetal or Container (if container which image + tag):
Relevant Files
Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)
Steps To Reproduce
accelerate==1.7.0
blinker==1.4
certifi==2025.4.26
charset-normalizer==3.4.2
coloredlogs==15.0.1
cryptography==3.4.8
dbus-python==1.2.18
diffusers==0.33.1
distro==1.7.0
filelock==3.18.0
flatbuffers==25.2.10
fsspec==2025.5.0
hf-xet==1.1.2
httplib2==0.20.2
huggingface-hub==0.31.4
humanfriendly==10.0
idna==3.10
importlib-metadata==4.6.4
jeepney==0.7.1
Jinja2==3.1.6
keyring==23.5.0
launchpadlib==1.10.16
lazr.restfulclient==0.14.4
lazr.uri==1.0.6
markdown-it-py==3.0.0
MarkupSafe==3.0.2
mdurl==0.1.2
more-itertools==8.10.0
mpmath==1.3.0
netron==8.3.4
networkx==3.4.2
numpy==1.26.4
nvidia-cublas-cu12==12.6.4.1
nvidia-cuda-cupti-cu12==12.6.80
nvidia-cuda-nvrtc-cu12==12.6.77
nvidia-cuda-runtime-cu12==12.6.77
nvidia-cudnn-cu12==9.5.1.17
nvidia-cufft-cu12==11.3.0.4
nvidia-cufile-cu12==1.11.1.6
nvidia-curand-cu12==10.3.7.77
nvidia-cusolver-cu12==11.7.1.2
nvidia-cusparse-cu12==12.5.4.2
nvidia-cusparselt-cu12==0.6.3
nvidia-nccl-cu12==2.26.2
nvidia-nvjitlink-cu12==12.6.85
nvidia-nvtx-cu12==12.6.77
oauthlib==3.2.0
onnx==1.18.0
onnxconverter-common==1.13.0
onnxruntime==1.22.0
onnxsim==0.4.36
packaging==25.0
pillow==11.2.1
polygraphy==0.49.22
protobuf==6.31.0
psutil==7.0.0
Pygments==2.19.1
PyGObject==3.42.1
PyJWT==2.3.0
pyparsing==2.4.7
python-apt==2.4.0+ubuntu3
PyYAML==6.0.2
regex==2024.11.6
requests==2.32.3
rich==14.0.0
safetensors==0.5.3
SecretStorage==3.3.1
sentencepiece==0.2.0
six==1.16.0
ssh-import-id==5.11
sympy==1.14.0
tensorrt==10.11.0.33
tensorrt_cu12==10.11.0.33
tensorrt_cu12_bindings==10.11.0.33
tensorrt_cu12_libs==10.11.0.33
tokenizers==0.21.1
torch==2.7.0
tqdm==4.67.1
transformers==4.52.2
triton==3.3.0
typing_extensions==4.13.2
urllib3==2.4.0
wadllib==1.3.6
zipp==1.0.0
import os
import tensorrt as trt
model_dir = os.path.dirname(onnx_file_path)
os.chdir(model_dir)
TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
builder = trt.Builder(TRT_LOGGER)
network = builder.create_network(1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH))
config = builder.create_builder_config()
config.set_flag(trt.BuilderFlag.FP16)
parser = trt.OnnxParser(network, TRT_LOGGER)
model = onnx.load(onnx_file_path)
print("opset version", model.opset_import)
print("ONNX 입력 목록:")
for i, inp in enumerate(model.graph.input):
shape = [
dim.dim_value if dim.HasField("dim_value") else "?"
for dim in inp.type.tensor_type.shape.dim
]
print(f"[{i}] name: {inp.name}, shape: {shape}")
print("end of input")
with open(os.path.basename(onnx_file_path), 'rb') as model:
if not parser.parse(model.read()):
for i in range(parser.num_errors):
print(parser.get_error(i))
raise RuntimeError("ONNX parsing failed.")
if network.num_layers == 0:
raise RuntimeError("No layers found in network. ONNX parsing likely failed.")
# Optimization Profile
profile = builder.create_optimization_profile()
if model_name == "transformer":
profile.set_shape("hidden_states", min=(1, 1, 64), opt=(4, 77, 64), max=(8, 128, 64))
profile.set_shape("encoder_hidden_states", min=(1, 512, 4096), opt=(4, 512, 4096), max=(8, 512, 4096))
profile.set_shape("pooled_projections", min=(1, 768), opt=(4, 768), max=(8, 768))
profile.set_shape("timestep", min=(1,), opt=(4,), max=(8,))
profile.set_shape("img_ids", min=(1, 3), opt=(4, 3), max=(8, 3))
profile.set_shape("txt_ids", min=(512, 3), opt=(512, 3), max=(512, 3))
profile.set_shape("guidance", min=(1,), opt=(4,), max=(8,))
elif model_name == "vae":
profile.set_shape(
"latent",
min=(1, 16, 32, 32),
opt=(4, 16, 64, 64),
max=(8, 16, 128, 128)
)
else:
profile.set_shape("input_ids", min=(1, input_ids), opt=(4, input_ids), max=(8, input_ids))
config.add_optimization_profile(profile)
# print(config)
last_layer = network.get_layer(network.num_layers - 1)
output_tensor = last_layer.get_output(0)
if output_tensor and not output_tensor.is_network_output:
network.mark_output(output_tensor)
engine_bytes = builder.build_serialized_network(network, config)
if engine_bytes is None:
raise RuntimeError("Failed to create engine")
with open(engine_file_path, 'wb') as f:
f.write(engine_bytes)
print(f"TensorRT engine 생성 완료: {engine_file_path}")
Please include:
- Exact steps/commands to build your repro
- Exact steps/commands to run your repro
- Full traceback of errors encountered