Description
Same model, but different PCs with different GPUs, SW SDKs (NVIDIA libraries and Python packages (Based on my understanding they are don’t care differences)).
On PC#1 I successfully generate the TRT engine and on the PC#2 I cannot.
Environment
PC#1:
TensorRT Version: 8.4.0.6
GPU Type: Quadro RTX 3000
Nvidia Driver Version: R516.01 (r515_95-3) / 31.0.15.1601 (4-24-2022)
CUDA Version: 11.7
CUDNN Version: 8.1.1
Operating System + Version: Windows 10
Python Version (if applicable): 3.6.8
TensorFlow Version (if applicable): NA
PyTorch Version (if applicable): NA
Baremetal or Container (if container which image + tag): Baremetal
PC#2:
TensorRT Version: 8.4.0.6
GPU Type: GeForce 3090
Nvidia Driver Version: R511.65(r511_37-13) / 30.0.15.1165 (1-28-2022)
CUDA Version: 11.4
CUDNN Version: 8.1.1
Operating System + Version: Windows 10
Python Version (if applicable): 3.6.8
TensorFlow Version (if applicable): NA
PyTorch Version (if applicable): NA
Baremetal or Container (if container which image + tag): Baremetal
Relevant Files
Attached are:
Model onnx
TRT engine generation report on the PC#1
Polygraphy report on the PC#1
TRT engine generation report on the PC#2
Polygraphy report on the PC#2
model_rand_weights_folded.onnx (2.8 MB)
trt_engine_3090_report.txt (630.8 KB)
Polygraphy_3090_report.txt (2.5 KB)
Polygraphy_3000_report.txt (2.0 KB)
trt_engine_3000_report.txt (574.5 KB)
Steps To Reproduce
Any basic TRT python logic (Based on TRT SDK Python samples) which load the onnx, use the builder\netwrok services after successfully pasre to generate the egnine file.
Some extra details:
The error is:
1: [convolutionRunner.cpp::nvinfer1::rt::task::CaskConvolutionRunner::onShapeChange::153] Error Code 1: Cask ( Failed to update runtime arguments.)
It easy to see that different CUDA kernels are tested and checked for the specific GPU for example:
3000 - Conv_220 Set Tactic Name: sm70_xmma_fprop_implicit_gemm_f32f32_f32f32_f32_nhwckrsc_nhwc_tilesize64x128x8_stage1_warpsize1x4x1_g1_ffma_aligna4_alignc4 Tactic: -2431551186657551688
3090 - Conv_220 Set Tactic Name:
ampere_scudnn_winograd_128x128_ldg1_ldg4_relu_tile442t_nt_v1 Tactic: -6664441261382767776
The folded model was generated on the PC#1 and copied to PC#2 for TRT engine generation process.
If I tried to fold the model on PC#2,
The following problem:
Polygraphy poblem
is emphasized and I’m getting a different folded model.