Description
I have an ONNX model that includes the GridSampler op. Since this is a custom op, I have built the corresponding plugin (from onnxparser-trt-plugin-sample/TensorRT/plugin/gridSamplerPlugin at master · TrojanXu/onnxparser-trt-plugin-sample · GitHub) as a shared lib and load it when building the engine.
The model is converted fine in FP32 mode, but in FP16 mode the builder stuck on this stage:
[10/20/2022-11:02:28] [TRT] [V] =============== Computing costs for
[10/20/2022-11:02:28] [TRT] [V] *************** Autotuning format combination: Float(10240000,40000,200,1), Float(80000,400,2,1) -> Float(10240000,40000,200,1) ***************
[10/20/2022-11:02:28] [TRT] [V] --------------- Timing Runner: grid_sampler_1021 (PluginV2)
[10/20/2022-11:02:28] [TRT] [V] Tactic: 0x0000000000000000 Time: 0.181931
[10/20/2022-11:02:28] [TRT] [V] Fastest Tactic: 0x0000000000000000 Time: 0.181931
[10/20/2022-11:02:28] [TRT] [V] >>>>>>>>>>>>>>> Chose Runner Type: PluginV2 Tactic: 0x0000000000000000
[10/20/2022-11:02:28] [TRT] [V] *************** Autotuning format combination: Half(10240000,40000,200,1), Half(80000,400,2,1) -> Half(10240000,40000,200,1) ***************
[10/20/2022-11:02:28] [TRT] [V] --------------- Timing Runner: grid_sampler_1021 (PluginV2)
It stucks there forever, with 100% GPU utilization:
$ nvidia-smi
Thu Oct 20 11:07:32 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.05 Driver Version: 520.61.05 CUDA Version: 11.8 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA RTX A6000 On | 00000000:03:00.0 Off | Off |
| 30% 45C P2 105W / 300W | 10351MiB / 49140MiB | 100% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 737234 C python 10348MiB |
+-----------------------------------------------------------------------------+
I am wondering what could be the reason for this?
I am running the job in Docker container derived from nvcr.io/nvidia/pytorch:22.07-py3
Environment
TensorRT Version: 8.4.1.5
GPU Type: NVIDIA RTX A6000
Nvidia Driver Version: 520.61.05
CUDA Version: 11.7 Update 1 Preview
CUDNN Version:
Operating System + Version: Ubuntu 20.04
Python Version (if applicable): 3.8.13
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag): PyTorch Release Notes :: NVIDIA Deep Learning Frameworks Documentation