Will99
October 12, 2023, 11:01am
1
I convert an onnx model to fp32 trt engine and it works; But when I convert the same onnx model to int8 trt engine, there is errors as below:
[10/12/2023-17:31:46] [TRT] [W] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See “Lazy Loading” section of CUDA documentation 1. Introduction — CUDA C Programming Guide
[10/12/2023-17:31:46] [TRT] [I] Starting Calibration.
[10/12/2023-17:33:14] [TRT] [E] 1: [executionContext.cpp::executeInternal::1177] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[10/12/2023-17:33:14] [TRT] [E] 1: [resizingAllocator.cpp::deallocate::105] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[10/12/2023-17:33:14] [TRT] [E] 1: [resizingAllocator.cpp::deallocate::105] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[10/12/2023-17:33:14] [TRT] [E] 1: [resizingAllocator.cpp::deallocate::105] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[10/12/2023-17:33:14] [TRT] [E] 1: [resizingAllocator.cpp::deallocate::105] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[10/12/2023-17:33:14] [TRT] [E] 1: [resizingAllocator.cpp::deallocate::105] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[10/12/2023-17:33:14] [TRT] [E] 1: [resizingAllocator.cpp::deallocate::105] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[10/12/2023-17:33:14] [TRT] [E] 1: [resizingAllocator.cpp::deallocate::105] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[10/12/2023-17:33:14] [TRT] [E] 1: [resizingAllocator.cpp::deallocate::105] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[10/12/2023-17:33:14] [TRT] [E] 1: [resizingAllocator.cpp::deallocate::105] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[10/12/2023-17:33:14] [TRT] [E] 1: [resizingAllocator.cpp::deallocate::105] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[10/12/2023-17:33:14] [TRT] [E] 1: [resizingAllocator.cpp::deallocate::105] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[10/12/2023-17:33:14] [TRT] [E] 3: [engine.cpp::~Engine::298] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/engine.cpp::~Engine::298, condition: mExecutionContextCounter.use_count() == 1. Destroying an engine object before destroying the IExecutionContext objects it created leads to undefined behavior.
)
[10/12/2023-17:33:14] [TRT] [E] 1: [cudaDriverHelpers.cpp::operator()::94] Error Code 1: Cuda Driver (an illegal memory access was encountered)
[10/12/2023-17:33:14] [TRT] [E] 1: [cudaResources.cpp::~ScopedCudaStream::47] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[10/12/2023-17:33:14] [TRT] [E] 2: [calibrator.cpp::calibrateEngine::1181] Error Code 2: Internal Error (Assertion context->executeV2(&bindings[0]) failed. )
What might be the issue and how can I fix the issue?
Hi @Will99 ,
What is the version of TRT you are using?
Also can you please share the onnx model and repro steps with us?
Thanks
Will99
November 1, 2023, 6:34am
4
engine_and_onnx.zip (45.7 MB)
Hello @AakankshaS
the attachment is the onnx file and the corresponding engine(fp32) that I created.
the versions of my environment info as below, including the version of tensorrt. And some other information that might be useful.
Thanks.
[11/01/2023-14:02:35] [TRT] [I] ONNX IR version: 0.0.6
[11/01/2023-14:02:35] [TRT] [I] Opset version: 11
[11/01/2023-14:02:35] [TRT] [I] Producer name: pytorch
[11/01/2023-14:02:35] [TRT] [I] Producer version: 2.0.0
[11/01/2023-14:02:35] [TRT] [I] Domain:
[11/01/2023-14:02:35] [TRT] [I] Model version: 0
[11/01/2023-14:02:35] [TRT] [I] Doc string:
±----------------------------------------------------------------------------+
| NVIDIA-SMI 470.199.02 Driver Version: 470.199.02 CUDA Version: 11.4 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Quadro P4000 Off | 00000000:02:00.0 On | N/A |
| 55% 58C P8 10W / 105W | 495MiB / 8111MiB | 3% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 2651 G /usr/lib/xorg/Xorg 216MiB |
| 0 N/A N/A 2897 G /usr/bin/gnome-shell 49MiB |
| 0 N/A N/A 11057 G /proc/self/exe 59MiB |
| 0 N/A N/A 2950504 G …205442604626590831,262144 54MiB |
| 0 N/A N/A 3653051 G …RendererForSitePerProcess 87MiB |
±----------------------------------------------------------------------------+
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0
TensorRT-8.6.1.6
below is the get_batch function defined in the calibrator.
def get_batch(self, names: Sequence[str], **kwargs) -> list:
"""Get batch data."""
if self.count < self.dataset_length:
input_group = self.calib_data[str(self.count)]
ret = []
# ret_batch = []
for name in names:
# input_group = self.calib_data[name]
data_np = input_group[name][...].astype(np.float32)
# tile the tensor so we can keep the same distribute
opt_shape = self.input_shapes[name]['opt_shape']
data_shape = data_np.shape
reps = [
int(np.ceil(opt_s / data_s))
for opt_s, data_s in zip(opt_shape, data_shape)
]
data_np = np.tile(data_np, reps)
slice_list = tuple(slice(0, end) for end in opt_shape)
data_np = data_np[slice_list]
if 'voxels' == name:
data_np_cuda_ptr_dict_voxels = cuda.mem_alloc(data_np.nbytes)
cuda.memcpy_htod(data_np_cuda_ptr_dict_voxels,
np.ascontiguousarray(data_np))
self.buffers[name] = data_np_cuda_ptr_dict_voxels
elif 'num_points' == name:
data_np_cuda_ptr_dict_num_points = cuda.mem_alloc(data_np.nbytes)
cuda.memcpy_htod(data_np_cuda_ptr_dict_num_points,
np.ascontiguousarray(data_np))
self.buffers[name] = data_np_cuda_ptr_dict_num_points
elif 'coors' == name:
data_np_cuda_ptr_dict_coors = cuda.mem_alloc(data_np.nbytes)
cuda.memcpy_htod(data_np_cuda_ptr_dict_coors,
np.ascontiguousarray(data_np))
self.buffers[name] = data_np_cuda_ptr_dict_coors
ret.append(int(self.buffers[name]))
# ret_batch = [ret]
self.count += 1
return ret
else:
return None