safeContext.cpp (184) - Cudnn Error in configure: 7 (CUDNN_STATUS_MAPPING_ERROR)

michael.craggs · November 11, 2020, 3:59pm

Description

I am trying to enqueue a a inference task to the ExecutionContext but receive the following error:

safeContext.cpp (184) - Cudnn Error in configure: 7 (CUDNN_STATUS_MAPPING_ERROR)

This happens with both enqueue and enqueueV2.
Synchronous execution ( execute() ) works without producing an error.
As recommended in the best practices, I deserialize the engine from file, which is a modified YoloV3 loaded from onnx.
I only have 1 engine and 1 ExecutionContext at the moment, but they don’t run on the applications main thread.

Could anyone point me in the right direction?

Environment

TensorRT Version: 7.2.1
GPU Type: GTX 1070 Driver Version: 455.38
Nvidia Driver Version:
CUDA Version: 11.1
CUDNN Version: Unsure using an official container image
Operating System + Version: Ubuntu 18.04
Container: nvcr.io/nvidia/tensorrt:20.10-py3

Working code

cudaMemcpyAsync(deviceBuffer[0], p_data,
                context.binding.deviceBuffer[0].getSize(),
                cudaMemcpyHostToDevice, context.stream);

if (context.context->execute(p_batchSize, &deviceBuffer[0]) != true) {
    LOG(ERROR) << "SyncInference failed!";
}

for (auto i = 1; i < deviceBuffer.size(); ++i) {
    cudaMemcpyAsync(context.binding.hostBuffer[i].get(), deviceBuffer[i],
                    context.binding.hostBuffer[i].getSize(),
                    cudaMemcpyDeviceToHost, context.stream);
}

Non working

cudaMemcpyAsync(deviceBuffer[0], p_data,
                context.binding.deviceBuffer[0].getSize(),
                cudaMemcpyHostToDevice, context.stream);

context.context->enqueue(p_batchSize, &deviceBuffer[0], context.stream, nullptr);    

for (auto i = 1; i < deviceBuffer.size(); ++i) {
    cudaMemcpyAsync(context.binding.hostBuffer[i].get(), deviceBuffer[i],
                    context.binding.hostBuffer[i].getSize(),
                    cudaMemcpyDeviceToHost, context.stream);
}

minervini.massimo · December 21, 2020, 11:00am

Hi @michael.craggs and NVIDIA support team,

Did you find a solution to this issue?

I find myself in a very similar situation and I keep getting this error:

ERROR: safeContext.cpp (184) - Cudnn Error in configure: 7 (CUDNN_STATUS_MAPPING_ERROR)
ERROR: FAILED_EXECUTION: std::exception

I cannot figure out what’s wrong or even what the error message is trying to say.
Googling the error resulted in some GitHub issues and posts in this forum, but none of them helped resolve this issue.

My current environment is the following (though my ultimate goal is to run inference on a Jetson device):

TensorRT Version: 7.2.2.3 (but I tried also with 7.2.1.6)
GPU Type: GeForce GTX 1080 Ti
Driver Version: 455.45.01
CUDA Version: 11.1.105
cuDNN Version: 8.0.5.39
Operating System + Version: Ubuntu 18.04

Relevant portion of code was adopted from the TensorRT samples:

samplesCommon::BufferManager buffers(engine);
auto context = SampleUniquePtr<nvinfer1::IExecutionContext>(engine->createExecutionContext());
buffers.copyInputToDevice();
bool status = context->executeV2(buffers.getDeviceBindings().data());

Specifically, executeV2 is generating the error.

The TensorRT engine is deserialized from a file generated by trtexec on an ONNX model, which in turn was obtained from a semantic segmentation model trained with Python and TensorFlow and converted to ONNX with tf2onnx.

Please let me know if I can provide further details to help debug this issue.
Thanks!

AakankshaS · December 28, 2020, 8:07am

Hi @minervini.massimo, @michael.craggs,
Can you please help us with your model and the script for better assistance.

Thanks!

minervini.massimo · December 28, 2020, 6:18pm

Hi @AakankshaS ,

Thank you for offering help.

In the meantime, I downgraded my environment to CUDA 10.2, because I was planning on using also the VPI library (version 0.4.4) and just realized that it supports CUDA versions up to 10.2.
Thus, my current enviroment is the following:

Linux distro and version: Ubuntu 18.04
GPU Type: GeForce GTX 1080 Ti
Nvidia driver version: 455.45.01
CUDA version: 10.2.89
CUDNN version: 8.0.4
Python version: 3.6.9
Tensorflow version: 2.3.1 (built from source)
TensorRT version: 7.2.1.6 (I also built the TensorRT-OSS components from release/7.2 branch)

With CUDA 10.2 it got even worse, because now not even trtexec is working anymore.
I tried without success to use different combinations of cuDNN, Tensorflow and TensorRT versions.

The code snippet I posted in my previous comment is contained in a ROS node, not a standalone program.
I’m just adopting commands from the standard samples, nothing fancy, to load a TensorRT engine and perform inference.
If you think it might help, tomorrow I can put together a minimal example and post it here.

I enclose below the links to download the model for image segmentation equivalent to the one I’m using (same architecture, but random weights).
It’s in ONNX format and was generated with keras2onnx on a Keras-Tensorflow model.
I exported the model with different opsets, because they eventually produce different errors with trtexec:

When I run trtexec on a model:

/usr/src/tensorrt/bin/trtexec --onnx=model.onnx --explicitBatch --saveEngine=model.trt --verbose

With opsets 9 and 10 I get:

 ----- Parsing of ONNX model model.onnx is Done ---- 
Segmentation fault (core dumped)

Instead, with opset 11 I get:

terminate called after throwing an instance of 'std::out_of_range'
  what():  Attribute not found: pads
Aborted (core dumped)

michael.craggs · January 5, 2021, 8:18am

Hi,
sorry for the late reply.
I am not sure what exactly made this issue go away, as rewrote the buffer/memory handling classes.
The other curiosity I encounter is that enqueue returns false although execution works and the results are correct, not sure if this is an API bug or how/what to handle in this case.

Anyway, I highly recommend using ‘cuda-gdb’ as it provides valuable insights what is going wrong.

Cheers

minervini.massimo · January 5, 2021, 10:31am

Hi @michael.craggs,

Thank you very much for the advice!

Currently, with CUDA 10.2, I’m stuck at converting the ONNX model to a TensorRT engine with trtexec, before any of my own code.
Thus, I re-compiled TensorRT-OSS with CMAKE_BUILD_TYPE=Debug and tried to run trtexec_debug with cuda-gdb.
If I run it passing the ONNX model with opset 10 as argument to --onnx, it stops at:

Thread 1 "trtexec_debug" received signal SIGSEGV, Segmentation fault.
0x00007fffe7210438 in vtable for __cxxabiv1::__si_class_type_info () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6

With backtrace:

#0  0x00007fffe7210438 in vtable for __cxxabiv1::__si_class_type_info () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#1  0x00005555555636f1 in sample::networkToEngine (build=..., sys=..., builder=..., network=..., err=...) at /home/massimo/Downloads/NVIDIA/TensorRT/samples/common/sampleEngines.cpp:471
#2  0x00005555555646f4 in sample::modelToEngine (model=..., build=..., sys=..., err=...) at /home/massimo/Downloads/NVIDIA/TensorRT/samples/common/sampleEngines.cpp:620
#3  0x0000555555564f71 in sample::getEngine (model=..., build=..., sys=..., err=...) at /home/massimo/Downloads/NVIDIA/TensorRT/samples/common/sampleEngines.cpp:698
#4  0x00005555555a9ffd in main (argc=2, argv=0x7fffffffd978) at /home/massimo/Downloads/NVIDIA/TensorRT/samples/opensource/trtexec/trtexec.cpp:149

Inspecting frame 1 highlights:

471	    config->setProfilingVerbosity(build.nvtxMode);

And printing build.nvtxMode shows nvinfer1::ProfilingVerbosity::kDEFAULT, so apparently nothing out of the ordinary. Not sure how to extract further info from the debugger (any advice is welcome).
I hope this can help @AakankshaS to identify the issue.

michael.craggs · January 5, 2021, 11:03am

Have you tried running / developing inside a container? It may be not as convenient to setup and develop, but saves you from having to setup all dependencies / versions correctly.
TensorRT | NVIDIA NGC ( You will have to look for the release with the version you desire ).

As for development inside containers I use Developing inside a Container using Visual Studio Code Remote Development, not perfect but a good start :)

Cheers

minervini.massimo · January 5, 2021, 11:42am

Thank you for the pointers!
Actually, I have seen seen mentions to containers on forums but I myself haven’t tried developing inside a container yet.
If that facilitates fitting together all versions and dependencies, I will certainly give it a try, because lately I’ve been spending way more time compiling/installing all possible versions of the libraries than actually developing my application code…

Best,
Massimo

michael.craggs · January 5, 2021, 12:19pm

Yes it does, everything should be preinstalled in the base image. Derive your container from one of the official images and install whatever you need additionally.
It is also a cheap way to quick test / bench a model with trtexec if you dont have tensorrt installed locally.
Note: You will have to install nvidia container runtime tools and pass through the GPU with “–gpus all” when running.

cheers

MediaJ · June 8, 2021, 6:21pm

I had this problem in the tlt-v3 container, when I tried to load two models a torch and a trt model in my program with 2 gpus.
to solve that I did this before loading my trt model:

torch.cuda.initialized = True
torch.cuda.is_available()
torch.cuda.set_device(0)
#load your trt model

then I run my python code by this command:
CUDA_VISIBLE_DEVICES=0 python test_models.py

damien.menigaux · July 9, 2021, 3:43pm

@MediaJ 's solution worked for me

Topic		Replies	Views
TensorRT inference context in ROS callback TensorRT tensorrt , cuda	13	2531	January 8, 2023
Error Code 1: Cudnn (CUDNN_STATUS_EXECUTION_FAILED) TensorRT cuda	3	2156	May 31, 2022
ERROR: ../rtSafe/safeContext.cpp (133) - Cudnn Error in configure: 7 (CUDNN_STATUS_MAPPING_ERROR) Jetson Xavier NX cuda	4	1552	October 12, 2021
../rtSafe/cuda/cudaConvolutionRunner.cpp (483) - Cudnn Error in executeConv: 3 (CUDNN_STATUS_BAD_PARAM) TensorRT	3	703	November 2, 2022
:nvinfer1::rt::ExecutionContext::enqueueInternal::330, condition: bindings[x] != nullptr TensorRT tensorrt	1	1877	February 15, 2022
Cuda Error in launchPwgenKernel- When running a specific engine in async TensorRT tensorrt	9	2154	June 11, 2022
PyTorch FCN-ResNet50 --> ONNX --> TensorRT TensorRT	3	971	February 17, 2022
Unable to run ONNX runtime with TensorRT execution provider on docker based on NVidia image CUDA Setup and Installation	4	7431	June 22, 2022
[executionContext.cpp::executeInternal::652] Error Code 1: Cuda Runtime (an illegal memory access was encountered) \| Cuda failure: 700 TensorRT tensorrt	5	2873	April 11, 2022
Error occurred while running the Tensorrt samples: [reformat.cpp::executeCutensor::385] TensorRT tensorrt	3	1185	December 12, 2023

safeContext.cpp (184) - Cudnn Error in configure: 7 (CUDNN_STATUS_MAPPING_ERROR)

Description

Environment

Working code

Non working

Related topics