I have trouble loading trt engine through python api but trtexec seems work fine

Description

trtexec works fine. It shows that it could sunccessfully initiate the cuda and complete the inference. However, when I try to use python API to load the same engine , there is a problem with the cuda initialization. I check again with my environment variables and they seems no problem.
The following is the specified error:
[08/05/2023-12:55:05] [TRT] [W] CUDA initialization failure with error: 35
Segmentation fault

Environment

TensorRT Version: 8.6.1.6
GPU Type: Nvidia Geforce RTX 3090
Nvidia Driver Version: 520.61.05
CUDA Version: 11.8
Operating System + Version: Ubuntu 18.04.6

Relevant Files

Hi,
Please refer to the installation steps from the below link if in case you are missing on anything

Also, we suggest you to use TRT NGC containers to avoid any system dependency related issues.

Thanks!