safeContext.cpp (184) - Cudnn Error in configure: 7 (CUDNN_STATUS_MAPPING_ERROR)

Description

I am trying to enqueue a a inference task to the ExecutionContext but receive the following error:

safeContext.cpp (184) - Cudnn Error in configure: 7 (CUDNN_STATUS_MAPPING_ERROR)

This happens with both enqueue and enqueueV2.
Synchronous execution ( execute() ) works without producing an error.
As recommended in the best practices, I deserialize the engine from file, which is a modified YoloV3 loaded from onnx.
I only have 1 engine and 1 ExecutionContext at the moment, but they donā€™t run on the applications main thread.

Could anyone point me in the right direction?

Environment

TensorRT Version: 7.2.1
GPU Type: GTX 1070 Driver Version: 455.38
Nvidia Driver Version:
CUDA Version: 11.1
CUDNN Version: Unsure using an official container image
Operating System + Version: Ubuntu 18.04
Container: nvcr.io/nvidia/tensorrt:20.10-py3

Working code

cudaMemcpyAsync(deviceBuffer[0], p_data,
                context.binding.deviceBuffer[0].getSize(),
                cudaMemcpyHostToDevice, context.stream);

if (context.context->execute(p_batchSize, &deviceBuffer[0]) != true) {
    LOG(ERROR) << "SyncInference failed!";
}

for (auto i = 1; i < deviceBuffer.size(); ++i) {
    cudaMemcpyAsync(context.binding.hostBuffer[i].get(), deviceBuffer[i],
                    context.binding.hostBuffer[i].getSize(),
                    cudaMemcpyDeviceToHost, context.stream);
}

Non working

cudaMemcpyAsync(deviceBuffer[0], p_data,
                context.binding.deviceBuffer[0].getSize(),
                cudaMemcpyHostToDevice, context.stream);

context.context->enqueue(p_batchSize, &deviceBuffer[0], context.stream, nullptr);    

for (auto i = 1; i < deviceBuffer.size(); ++i) {
    cudaMemcpyAsync(context.binding.hostBuffer[i].get(), deviceBuffer[i],
                    context.binding.hostBuffer[i].getSize(),
                    cudaMemcpyDeviceToHost, context.stream);
}

Hi @michael.craggs and NVIDIA support team,

Did you find a solution to this issue?

I find myself in a very similar situation and I keep getting this error:

ERROR: safeContext.cpp (184) - Cudnn Error in configure: 7 (CUDNN_STATUS_MAPPING_ERROR)
ERROR: FAILED_EXECUTION: std::exception

I cannot figure out whatā€™s wrong or even what the error message is trying to say.
Googling the error resulted in some GitHub issues and posts in this forum, but none of them helped resolve this issue.

My current environment is the following (though my ultimate goal is to run inference on a Jetson device):

  • TensorRT Version: 7.2.2.3 (but I tried also with 7.2.1.6)
  • GPU Type: GeForce GTX 1080 Ti
  • Driver Version: 455.45.01
  • CUDA Version: 11.1.105
  • cuDNN Version: 8.0.5.39
  • Operating System + Version: Ubuntu 18.04

Relevant portion of code was adopted from the TensorRT samples:

samplesCommon::BufferManager buffers(engine);
auto context = SampleUniquePtr<nvinfer1::IExecutionContext>(engine->createExecutionContext());
buffers.copyInputToDevice();
bool status = context->executeV2(buffers.getDeviceBindings().data());

Specifically, executeV2 is generating the error.

The TensorRT engine is deserialized from a file generated by trtexec on an ONNX model, which in turn was obtained from a semantic segmentation model trained with Python and TensorFlow and converted to ONNX with tf2onnx.

Please let me know if I can provide further details to help debug this issue.
Thanks!

Hi @minervini.massimo, @michael.craggs,
Can you please help us with your model and the script for better assistance.

Thanks!

Hi @AakankshaS ,

Thank you for offering help.

In the meantime, I downgraded my environment to CUDA 10.2, because I was planning on using also the VPI library (version 0.4.4) and just realized that it supports CUDA versions up to 10.2.
Thus, my current enviroment is the following:

  • Linux distro and version: Ubuntu 18.04
  • GPU Type: GeForce GTX 1080 Ti
  • Nvidia driver version: 455.45.01
  • CUDA version: 10.2.89
  • CUDNN version: 8.0.4
  • Python version: 3.6.9
  • Tensorflow version: 2.3.1 (built from source)
  • TensorRT version: 7.2.1.6 (I also built the TensorRT-OSS components from release/7.2 branch)

With CUDA 10.2 it got even worse, because now not even trtexec is working anymore.
I tried without success to use different combinations of cuDNN, Tensorflow and TensorRT versions.

The code snippet I posted in my previous comment is contained in a ROS node, not a standalone program.
Iā€™m just adopting commands from the standard samples, nothing fancy, to load a TensorRT engine and perform inference.
If you think it might help, tomorrow I can put together a minimal example and post it here.

I enclose below the links to download the model for image segmentation equivalent to the one Iā€™m using (same architecture, but random weights).
Itā€™s in ONNX format and was generated with keras2onnx on a Keras-Tensorflow model.
I exported the model with different opsets, because they eventually produce different errors with trtexec:

When I run trtexec on a model:

/usr/src/tensorrt/bin/trtexec --onnx=model.onnx --explicitBatch --saveEngine=model.trt --verbose

With opsets 9 and 10 I get:

 ----- Parsing of ONNX model model.onnx is Done ---- 
Segmentation fault (core dumped)

Instead, with opset 11 I get:

terminate called after throwing an instance of 'std::out_of_range'
  what():  Attribute not found: pads
Aborted (core dumped)

Hi,
sorry for the late reply.
I am not sure what exactly made this issue go away, as rewrote the buffer/memory handling classes.
The other curiosity I encounter is that enqueue returns false although execution works and the results are correct, not sure if this is an API bug or how/what to handle in this case.

Anyway, I highly recommend using ā€˜cuda-gdbā€™ as it provides valuable insights what is going wrong.

Cheers

Hi @michael.craggs,

Thank you very much for the advice!

Currently, with CUDA 10.2, Iā€™m stuck at converting the ONNX model to a TensorRT engine with trtexec, before any of my own code.
Thus, I re-compiled TensorRT-OSS with CMAKE_BUILD_TYPE=Debug and tried to run trtexec_debug with cuda-gdb.
If I run it passing the ONNX model with opset 10 as argument to --onnx, it stops at:

Thread 1 "trtexec_debug" received signal SIGSEGV, Segmentation fault.
0x00007fffe7210438 in vtable for __cxxabiv1::__si_class_type_info () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6

With backtrace:

#0  0x00007fffe7210438 in vtable for __cxxabiv1::__si_class_type_info () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#1  0x00005555555636f1 in sample::networkToEngine (build=..., sys=..., builder=..., network=..., err=...) at /home/massimo/Downloads/NVIDIA/TensorRT/samples/common/sampleEngines.cpp:471
#2  0x00005555555646f4 in sample::modelToEngine (model=..., build=..., sys=..., err=...) at /home/massimo/Downloads/NVIDIA/TensorRT/samples/common/sampleEngines.cpp:620
#3  0x0000555555564f71 in sample::getEngine (model=..., build=..., sys=..., err=...) at /home/massimo/Downloads/NVIDIA/TensorRT/samples/common/sampleEngines.cpp:698
#4  0x00005555555a9ffd in main (argc=2, argv=0x7fffffffd978) at /home/massimo/Downloads/NVIDIA/TensorRT/samples/opensource/trtexec/trtexec.cpp:149

Inspecting frame 1 highlights:

471	    config->setProfilingVerbosity(build.nvtxMode);

And printing build.nvtxMode shows nvinfer1::ProfilingVerbosity::kDEFAULT, so apparently nothing out of the ordinary. Not sure how to extract further info from the debugger (any advice is welcome).
I hope this can help @AakankshaS to identify the issue.

Have you tried running / developing inside a container? It may be not as convenient to setup and develop, but saves you from having to setup all dependencies / versions correctly.
TensorRT | NVIDIA NGC ( You will have to look for the release with the version you desire ).

As for development inside containers I use Developing inside a Container using Visual Studio Code Remote Development, not perfect but a good start :)

Cheers

Thank you for the pointers!
Actually, I have seen seen mentions to containers on forums but I myself havenā€™t tried developing inside a container yet.
If that facilitates fitting together all versions and dependencies, I will certainly give it a try, because lately Iā€™ve been spending way more time compiling/installing all possible versions of the libraries than actually developing my application codeā€¦

Best,
Massimo

Yes it does, everything should be preinstalled in the base image. Derive your container from one of the official images and install whatever you need additionally.
It is also a cheap way to quick test / bench a model with trtexec if you dont have tensorrt installed locally.
Note: You will have to install nvidia container runtime tools and pass through the GPU with ā€œā€“gpus allā€ when running.

cheers

I had this problem in the tlt-v3 container, when I tried to load two models a torch and a trt model in my program with 2 gpus.
to solve that I did this before loading my trt model:

torch.cuda.initialized = True
torch.cuda.is_available()
torch.cuda.set_device(0)
#load your trt model

then I run my python code by this command:
CUDA_VISIBLE_DEVICES=0 python test_models.py

2 Likes

@MediaJ 's solution worked for me