CUDA error 700 - an illegal memory access was encountered

Description

I received the error CUDA error 700 - an illegal memory access was encountered.
What is the meaning of this error? Is the problem in the code or in the engine?

Environment

TensorRT Version: TensorRT-8.0.1.6_CUDA_11.3
GPU Type: Nvidia TITAN RTX
CUDA Version: 11.3
CUDNN Version: 8.2.0
Operating System + Version: Windows 10

Relevant Files

code.cpp (4.6 KB)
PetImages.zip (356.2 KB)
I can send the engine per private message. The size of the file is to big.

Steps To Reproduce

Run the script code.cpp.

The error occurs at this line:
cudaMemcpyAsync(outputPred, outputBuffer, outputSize*sizeof(float), cudaMemcpyDeviceToHost, stream);

The output is:
no error
0
output of outputPred:
-4.31602e+08
-4.31602e+08

-4.31602e+08
-4.31602e+08

-4.31602e+08
-4.31602e+08

-4.31602e+08
-4.31602e+08

-4.31602e+08
-4.31602e+08

-4.31602e+08
-4.31602e+08

-4.31602e+08
-4.31602e+08

-4.31602e+08
-4.31602e+08

-4.31602e+08
-4.31602e+08

-4.31602e+08
-4.31602e+08

-4.31602e+08
-4.31602e+08

-4.31602e+08
-4.31602e+08

an illegal memory access was encountered
700

Hi,

This could be due to you’re running out of memory or accessing an illegal address via a pointer.
Following similar issue may help you.

Thank you.

How should I change my code to get more information?

Please refer Developer Guide :: NVIDIA Deep Learning TensorRT Documentation

I read a trt-engine in my code and do not build an engine. Therefore I do not have a builder object.

Here are some output information:

[03/23/2022-13:25:30] [I] [TRT] [MemUsageChange] Init CUDA: CPU +442, GPU +0, now: CPU 16231, GPU 1384 (MiB)
[03/23/2022-13:25:30] [I] [TRT] Loaded engine size: 118 MB
[03/23/2022-13:25:30] [I] [TRT] [MemUsageSnapshot] deserializeCudaEngine begin: CPU 16231 MiB, GPU 1384 MiB
[03/23/2022-13:25:31] [V] [TRT] Using cublasLt a tactic source
[03/23/2022-13:25:31] [W] [TRT] TensorRT was linked against cuBLAS/cuBLAS LT 11.5.1 but loaded cuBLAS/cuBLAS LT 11.4.2
[03/23/2022-13:25:31] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +465, GPU +166, now: CPU 16708, GPU 1668 (MiB)
[03/23/2022-13:25:31] [V] [TRT] Using cuDNN as a tactic source
[03/23/2022-13:25:31] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +219, GPU +170, now: CPU 16927, GPU 1838 (MiB)
[03/23/2022-13:25:31] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 16926, GPU 1820 (MiB)
[03/23/2022-13:25:31] [V] [TRT] Deserialization required 1101703 microseconds.
[03/23/2022-13:25:31] [I] [TRT] [MemUsageSnapshot] deserializeCudaEngine end: CPU 16926 MiB, GPU 1820 MiB
[03/23/2022-13:25:31] [I] [TRT] [MemUsageSnapshot] ExecutionContext creation begin: CPU 16807 MiB, GPU 1820 MiB
[03/23/2022-13:25:31] [V] [TRT] Using cublasLt a tactic source
[03/23/2022-13:25:31] [W] [TRT] TensorRT was linked against cuBLAS/cuBLAS LT 11.5.1 but loaded cuBLAS/cuBLAS LT 11.4.2
[03/23/2022-13:25:31] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +10, now: CPU 16807, GPU 1830 (MiB)
[03/23/2022-13:25:31] [V] [TRT] Using cuDNN as a tactic source
[03/23/2022-13:25:31] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +3, GPU +8, now: CPU 16810, GPU 1838 (MiB)
[03/23/2022-13:25:31] [V] [TRT] Total per-runner device memory is 124152832
[03/23/2022-13:25:31] [V] [TRT] Total per-runner host memory is 115600
[03/23/2022-13:25:31] [V] [TRT] Allocated activation device memory of size 44753408
[03/23/2022-13:25:31] [I] [TRT] [MemUsageSnapshot] ExecutionContext creation end: CPU 16811 MiB, GPU 1999 MiB

I changed my code to:

sample::gLogger.setReportableSeverity(nvinfer1::ILogger::Severity::kVERBOSE);
IRuntime* runtime = createInferRuntime(sample::gLogger);

but I receive no more infomation

I found out an inference with 11 images the code works but with 12 images it does not work. Why?

Hi,

Looks like you’re using static batch size, you may need to change to dynamic batch size.
Please refer following docs for working with dynamic inputs.

Thank you.