Segmentation fault (cored dumped) when using TensorRT while quantizing Stable Diffusion 1.5 to Int8

pushkarjain1009 · April 30, 2024, 2:00pm

Description

For some specific use, I am trying to quantise Stable Diffusion 1.5 to Int8 config.
I used method used in TensorRT/demo/Diffusion/demo_txt2img_xl.py in GitHub - NVIDIA/TensorRT: NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
by changing SD-XL checks.
But I am getting Segmentation fault while optimizing ONNX model.

[I] Initializing StableDiffusion txt2img demo using TensorRT
[I] Autoselected scheduler: PNDM
[I] Load tokenizer pytorch model from: pytorch_model/1.5/TXT2IMG/tokenizer
[I] Exporting ONNX model: onnx_quant/clip/model.onnx
[I] Load CLIP pytorch model from: pytorch_model/1.5/TXT2IMG/text_encoder
[I] Optimizing ONNX model: onnx_quant/clip.opt/model.onnx
[I] Folding Constants | Pass 1
2024-04-30 10:41:11.861602662 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node /text_model/Unsqueeze_2
2024-04-30 10:41:11.861628465 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node /text_model/Unsqueeze_1
2024-04-30 10:41:11.861659244 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node /text_model/Unsqueeze
2024-04-30 10:41:11.861667279 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node /text_model/Unsqueeze_8
2024-04-30 10:41:11.861675424 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node /text_model/Unsqueeze_7
[I] Total Nodes | Original: 1582, After Folding: 1016 | 566 Nodes Folded
[I] Folding Constants | Pass 2
024-04-30 10:41:15.698350946 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node /text_model/encoder/layers.11/self_attn/Unsqueeze_12
2024-04-30 10:41:15.698384169 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node /text_model/encoder/layers.11/self_attn/Unsqueeze_9
2024-04-30 10:41:15.698394366 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node /text_model/encoder/layers.11/self_attn/Unsqueeze_17
2024-04-30 10:41:15.698402047 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node /text_model/encoder/layers.11/self_attn/Unsqueeze_16
…
[I] Total Nodes | Original: 1016, After Folding: 840 | 176 Nodes Folded
[I] Folding Constants | Pass 3
[I] Total Nodes | Original: 840, After Folding: 840 | 0 Nodes Folded
[I] Calibrated weights not found, generating onnx_quant/unet-int8.l3.0.bs2.s30.c384.p0.4.a0.6/state_dict.pt
Replaced 846 modules to quantized modules
[I] Performing int8 calibration for 384 steps. This can take a long time.
100%|███████████████████████████████████████████| 30/30 [00:14<00:00, 2.13it/s]
100%|███████████████████████████████████████████| 30/30 [00:13<00:00, 2.29it/s]
100%|███████████████████████████████████████████| 30/30 [00:12<00:00, 2.31it/s]
100%|███████████████████████████████████████████| 30/30 [00:13<00:00, 2.28it/s]
100%|███████████████████████████████████████████| 30/30 [00:10<00:00, 2.78it/s]
…
[I] Exporting ONNX model: onnx_quant/unet-int8.l3.0.bs2.s30.c384.p0.4.a0.6/model.onnx
[I] Optimizing ONNX model: onnx_quant/unet-int8.l3.0.bs2.s30.c384.p0.4.a0.6.opt/model.onnx
UNetModel: original .. 6632 nodes, 7481 tensors, 3 inputs, 1 outputs
UNetModel: cleanup .. 6632 nodes, 7481 tensors, 3 inputs, 1 outputs
Segmentation fault (core dumped)

Environment

TensorRT Version: 8.6.1
CUDA Version: 11.8.89
CUDNN Version:
Operating System + Version: Ubuntu 22.04.2 LTS
Python Version (if applicable): Python 3.11.0
PyTorch Version (if applicable): 2.2.2+cu121

Steps To Reproduce

git clone GitHub - NVIDIA/TensorRT: NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
pip3 install TensorRT/demo/Diffusion/requirements.txt
change demo_txt2img.py and stable_diffusion_pipeline.py to quantise stable diffusion 1.5 also
python3 TensorRT/demo/Diffusion/demo_txt2img_quant.py “Astronaut riding a horse on mars, HD, 4k, Highly Detailed, realistic horse,” --onnx-dir “onnx_quant” --engine-dir “engine_quant” --build-static-batch

AakankshaS · May 31, 2024, 5:29pm

Hi @pushkarjain1009 ,
Checking this with Engineering team.

Topic		Replies	Views
Build TRT engine with onnx QAT model throws segmentation fault TensorRT	3	1369	August 12, 2021
INT8 quantization with Torch-TensorRT fails TensorRT tensorrt , pytorch	3	959	June 29, 2022
TensorRT INT8 conversion from an ONNX model TensorRT tensorrt , calibration , onnx	4	5969	July 29, 2024
Segmentation fault when using TensorRT to compile a model TensorRT	1	1455	June 27, 2022
Segmentation fault (core dumped) after run IExecutionContext.execute_async_v3() TensorRT cudnn	2	115	March 31, 2025
Segmentation fault on INT8 calibration of a object detection model using TF-TRT Jetson AGX Xavier tensorrt	7	784	October 18, 2021
Quantization of D-FINE in tensorrt 10.8 fails TensorRT tensorrt , cudnn , onnx	3	258	April 30, 2025
ONNX runtime result differs from int8 quantized pytorch model TensorRT tensorrt , onnx	5	2164	February 15, 2022
Segmentation fault using TensorRT 8.6.1 TensorRT tensorrt	2	426	November 15, 2023
TensorRT quantization uses int8 or uint8 TensorRT tensorrt	1	932	June 6, 2023

Segmentation fault (cored dumped) when using TensorRT while quantizing Stable Diffusion 1.5 to Int8

Description

Environment

Steps To Reproduce

Related topics