Description
I received the error CUDA error 700 - an illegal memory access was encountered.
What is the meaning of this error? Is the problem in the code or in the engine?
Environment
TensorRT Version : TensorRT-8.0.1.6_CUDA_11.3
GPU Type : Nvidia TITAN RTX
CUDA Version : 11.3
CUDNN Version : 8.2.0
Operating System + Version : Windows 10
Relevant Files
code.cpp (4.6 KB)
PetImages.zip (356.2 KB)
I can send the engine per private message. The size of the file is to big.
Steps To Reproduce
Run the script code.cpp.
The error occurs at this line:
cudaMemcpyAsync(outputPred, outputBuffer, outputSize*sizeof(float), cudaMemcpyDeviceToHost, stream);
The output is:
no error
0
output of outputPred:
-4.31602e+08
-4.31602e+08
-4.31602e+08
-4.31602e+08
-4.31602e+08
-4.31602e+08
-4.31602e+08
-4.31602e+08
-4.31602e+08
-4.31602e+08
-4.31602e+08
-4.31602e+08
-4.31602e+08
-4.31602e+08
-4.31602e+08
-4.31602e+08
-4.31602e+08
-4.31602e+08
-4.31602e+08
-4.31602e+08
-4.31602e+08
-4.31602e+08
-4.31602e+08
-4.31602e+08
an illegal memory access was encountered
700
Hi,
This could be due to you’re running out of memory or accessing an illegal address via a pointer.
Following similar issue may help you.
Hi,
You can enable the verbose output to get more information about the error.
bool SampleDynamicReshape::build()
{
...
sample::gLogger.setReportableSeverity(nvinfer1::ILogger::Severity::kVERBOSE);
return buildPredictionEngine(builder) /*&& buildPreprocessorEngine(builder)*/;
}
We got the following log when deploying the app with batchsize=4096.
It looks like the error occurs when TensorRT wants to write out the output tensor.
So please double-check the output buffer of enqueue …
Thank you.
How should I change my code to get more information?
I read a trt-engine in my code and do not build an engine. Therefore I do not have a builder object.
Here are some output information:
[03/23/2022-13:25:30] [I] [TRT] [MemUsageChange] Init CUDA: CPU +442, GPU +0, now: CPU 16231, GPU 1384 (MiB)
[03/23/2022-13:25:30] [I] [TRT] Loaded engine size: 118 MB
[03/23/2022-13:25:30] [I] [TRT] [MemUsageSnapshot] deserializeCudaEngine begin: CPU 16231 MiB, GPU 1384 MiB
[03/23/2022-13:25:31] [V] [TRT] Using cublasLt a tactic source
[03/23/2022-13:25:31] [W] [TRT] TensorRT was linked against cuBLAS/cuBLAS LT 11.5.1 but loaded cuBLAS/cuBLAS LT 11.4.2
[03/23/2022-13:25:31] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +465, GPU +166, now: CPU 16708, GPU 1668 (MiB)
[03/23/2022-13:25:31] [V] [TRT] Using cuDNN as a tactic source
[03/23/2022-13:25:31] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +219, GPU +170, now: CPU 16927, GPU 1838 (MiB)
[03/23/2022-13:25:31] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 16926, GPU 1820 (MiB)
[03/23/2022-13:25:31] [V] [TRT] Deserialization required 1101703 microseconds.
[03/23/2022-13:25:31] [I] [TRT] [MemUsageSnapshot] deserializeCudaEngine end: CPU 16926 MiB, GPU 1820 MiB
[03/23/2022-13:25:31] [I] [TRT] [MemUsageSnapshot] ExecutionContext creation begin: CPU 16807 MiB, GPU 1820 MiB
[03/23/2022-13:25:31] [V] [TRT] Using cublasLt a tactic source
[03/23/2022-13:25:31] [W] [TRT] TensorRT was linked against cuBLAS/cuBLAS LT 11.5.1 but loaded cuBLAS/cuBLAS LT 11.4.2
[03/23/2022-13:25:31] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +10, now: CPU 16807, GPU 1830 (MiB)
[03/23/2022-13:25:31] [V] [TRT] Using cuDNN as a tactic source
[03/23/2022-13:25:31] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +3, GPU +8, now: CPU 16810, GPU 1838 (MiB)
[03/23/2022-13:25:31] [V] [TRT] Total per-runner device memory is 124152832
[03/23/2022-13:25:31] [V] [TRT] Total per-runner host memory is 115600
[03/23/2022-13:25:31] [V] [TRT] Allocated activation device memory of size 44753408
[03/23/2022-13:25:31] [I] [TRT] [MemUsageSnapshot] ExecutionContext creation end: CPU 16811 MiB, GPU 1999 MiB
I changed my code to:
sample::gLogger.setReportableSeverity(nvinfer1::ILogger::Severity::kVERBOSE);
IRuntime* runtime = createInferRuntime(sample::gLogger);
but I receive no more infomation
I found out an inference with 11 images the code works but with 12 images it does not work. Why?
Hi,
Looks like you’re using static batch size, you may need to change to dynamic batch size.
Please refer following docs for working with dynamic inputs.
Thank you.