Segmentation fault (cored dumped) when using TensorRT while quantizing Stable Diffusion 1.5 to Int8

Description

For some specific use, I am trying to quantise Stable Diffusion 1.5 to Int8 config.
I used method used in TensorRT/demo/Diffusion/demo_txt2img_xl.py in GitHub - NVIDIA/TensorRT: NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
by changing SD-XL checks.
But I am getting Segmentation fault while optimizing ONNX model.

[I] Initializing StableDiffusion txt2img demo using TensorRT
[I] Autoselected scheduler: PNDM
[I] Load tokenizer pytorch model from: pytorch_model/1.5/TXT2IMG/tokenizer
[I] Exporting ONNX model: onnx_quant/clip/model.onnx
[I] Load CLIP pytorch model from: pytorch_model/1.5/TXT2IMG/text_encoder
[I] Optimizing ONNX model: onnx_quant/clip.opt/model.onnx
[I] Folding Constants | Pass 1
2024-04-30 10:41:11.861602662 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node /text_model/Unsqueeze_2
2024-04-30 10:41:11.861628465 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node /text_model/Unsqueeze_1
2024-04-30 10:41:11.861659244 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node /text_model/Unsqueeze
2024-04-30 10:41:11.861667279 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node /text_model/Unsqueeze_8
2024-04-30 10:41:11.861675424 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node /text_model/Unsqueeze_7
[I] Total Nodes | Original: 1582, After Folding: 1016 | 566 Nodes Folded
[I] Folding Constants | Pass 2
024-04-30 10:41:15.698350946 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node /text_model/encoder/layers.11/self_attn/Unsqueeze_12
2024-04-30 10:41:15.698384169 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node /text_model/encoder/layers.11/self_attn/Unsqueeze_9
2024-04-30 10:41:15.698394366 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node /text_model/encoder/layers.11/self_attn/Unsqueeze_17
2024-04-30 10:41:15.698402047 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node /text_model/encoder/layers.11/self_attn/Unsqueeze_16

[I] Total Nodes | Original: 1016, After Folding: 840 | 176 Nodes Folded
[I] Folding Constants | Pass 3
[I] Total Nodes | Original: 840, After Folding: 840 | 0 Nodes Folded
[I] Calibrated weights not found, generating onnx_quant/unet-int8.l3.0.bs2.s30.c384.p0.4.a0.6/state_dict.pt
Replaced 846 modules to quantized modules
[I] Performing int8 calibration for 384 steps. This can take a long time.
100%|███████████████████████████████████████████| 30/30 [00:14<00:00, 2.13it/s]
100%|███████████████████████████████████████████| 30/30 [00:13<00:00, 2.29it/s]
100%|███████████████████████████████████████████| 30/30 [00:12<00:00, 2.31it/s]
100%|███████████████████████████████████████████| 30/30 [00:13<00:00, 2.28it/s]
100%|███████████████████████████████████████████| 30/30 [00:10<00:00, 2.78it/s]

[I] Exporting ONNX model: onnx_quant/unet-int8.l3.0.bs2.s30.c384.p0.4.a0.6/model.onnx
[I] Optimizing ONNX model: onnx_quant/unet-int8.l3.0.bs2.s30.c384.p0.4.a0.6.opt/model.onnx
UNetModel: original … 6632 nodes, 7481 tensors, 3 inputs, 1 outputs
UNetModel: cleanup … 6632 nodes, 7481 tensors, 3 inputs, 1 outputs
Segmentation fault (core dumped)

Environment

TensorRT Version: 8.6.1
CUDA Version: 11.8.89
CUDNN Version:
Operating System + Version: Ubuntu 22.04.2 LTS
Python Version (if applicable): Python 3.11.0
PyTorch Version (if applicable): 2.2.2+cu121

Steps To Reproduce

  1. git clone GitHub - NVIDIA/TensorRT: NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
  2. pip3 install TensorRT/demo/Diffusion/requirements.txt
  3. change demo_txt2img.py and stable_diffusion_pipeline.py to quantise stable diffusion 1.5 also
  4. python3 TensorRT/demo/Diffusion/demo_txt2img_quant.py “Astronaut riding a horse on mars, HD, 4k, Highly Detailed, realistic horse,” --onnx-dir “onnx_quant” --engine-dir “engine_quant” --build-static-batch