ERROR: ../rtSafe/safeContext.cpp (133) - Cudnn Error in configure: 7 (CUDNN_STATUS_MAPPING_ERROR)

Jetson Xavier Nx
Jetpack = 4.4
TensorRT = 7.1.3-1 + cuda10.2

We have a system which was working fine and we needed to separate code base and wanted to include queue system to achieve asynchronous between two different modules.

We have successfully achieved it using celery (flask & redis), BUT we ended up with a pycuda driver issue:

Traceback (most recent call last):
File “/home/nvidia/.local/lib/python3.6/site-packages/flask/app.py”, line 2070, in wsgi_app
response = self.full_dispatch_request()
File “/home/nvidia/.local/lib/python3.6/site-packages/flask/app.py”, line 1515, in full_dispatch_request
rv = self.handle_user_exception(e)
File “/home/nvidia/.local/lib/python3.6/site-packages/flask/app.py”, line 1513, in full_dispatch_request
rv = self.dispatch_request()
File “/home/nvidia/.local/lib/python3.6/site-packages/flask/app.py”, line 1499, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)
File “ai_app.py”, line 111, in start_controller
controller.change_current_model(crop.lower(), variety.lower(), offline=offline)
File “/media/nvidia/Agdhi-128/Seedvision-Hardware-AI/trt_inference/controller.py”, line 35, in change_current_model
self.load_model(PaddyTRTModel, crop, variety)
File “/media/nvidia/Agdhi-128/Seedvision-Hardware-AI/trt_inference/controller.py”, line 23, in load_model
current_model = model_class(self.config)
File “/media/nvidia/Agdhi-128/Seedvision-Hardware-AI/trt_inference/paddy_trt.py”, line 53, in init
self.smc = TrtModel(self.smc_model_path)
File “/media/nvidia/Agdhi-128/Seedvision-Hardware-AI/trt_inference/model.py”, line 28, in init
self.inputs, self.outputs, self.bindings, self.stream = self.allocate_buffers()
File “/media/nvidia/Agdhi-128/Seedvision-Hardware-AI/trt_inference/model.py”, line 47, in allocate_buffers
stream = cuda.Stream()
pycuda._driver.LogicError: explicit_context_dependent failed: invalid device context - no currently active context?

We have solved it following this:
python - pyCUDA with Flask gives pycuda._driver.LogicError: cuModuleLoadDataEx - Stack Overflow

But we got another error:
[TensorRT] ERROR: …/rtSafe/safeContext.cpp (133) - Cudnn Error in configure: 7 (CUDNN_STATUS_MAPPING_ERROR)
[TensorRT] ERROR: FAILED_EXECUTION: std::exception

Solution for this:

But they are telling us to change back to the code which will give us previous error, now we are stuck in-between, please help us solve this, it may be a simple bug or a nightmare.

Let me know what are the details you want i will reply.

Found the solution by myself.

Glad to know issue resolved.
It’s much appreciate if can share how to fix.

Thanks

We had to change our model.py code to create a context (device.make_context()) before cuda.Stream() to rule out the first error and we had to push(cfx.push()) and pop(cfx.pop()) the context before and after context.execute_aysnc() to rule out the second error.

Below is the model.py code.

import numpy as np
import tensorrt as trt
#import pycuda.autoinit
import pycuda.driver as cuda

class HostDeviceMem(object):
    def __init__(self, host_mem, device_mem):
        self.host = host_mem
        self.device = device_mem

    def __str__(self):
        return "Host:\n" + str(self.host) + "\nDevice:\n" + str(self.device)

    def __repr__(self):
        return self.__str__()


class TrtModel:
    def __init__(self, engine_path, max_batch_size=1, dtype=np.float32):
        cuda.init()
        device = cuda.Device(0) # enter your gpu id here
        self.cfx = device.make_context()
         
        self.stream = cuda.Stream()
        self.engine_path = engine_path
        self.dtype = dtype
        self.logger = trt.Logger(trt.Logger.WARNING)
        self.runtime = trt.Runtime(self.logger)
        self.engine = self.load_engine(self.runtime, self.engine_path)
        self.context = self.engine.create_execution_context()#engine.create_execution_context_without_device_memory#
        self.max_batch_size = max_batch_size
        self.inputs, self.outputs, self.bindings  = self.allocate_buffers()
        
    @staticmethod
    def load_engine(trt_runtime, engine_path):
        trt.init_libnvinfer_plugins(None, "")
        with open(engine_path, 'rb') as f:
            engine_data = f.read()
        engine = trt_runtime.deserialize_cuda_engine(engine_data)
        return engine

    def allocate_buffers(self):
        inputs = []
        outputs = []
        bindings = []

        for binding in self.engine:
            size = trt.volume(self.engine.get_binding_shape(binding)) * self.max_batch_size
            host_mem = cuda.pagelocked_empty(size, self.dtype)
            device_mem = cuda.mem_alloc(host_mem.nbytes)

            bindings.append(int(device_mem))

            if self.engine.binding_is_input(binding):
                inputs.append(HostDeviceMem(host_mem, device_mem))
            else:
                outputs.append(HostDeviceMem(host_mem, device_mem))
                
        return inputs, outputs, bindings

    def __call__(self, x: np.ndarray, batch_size=2):
        self.cfx.push()
        x = x.astype(self.dtype)

        np.copyto(self.inputs[0].host, x.ravel())

        for inp in self.inputs:
            cuda.memcpy_htod_async(inp.device, inp.host, self.stream)

        self.context.execute_async(batch_size=batch_size, bindings=self.bindings,
                                   stream_handle=self.stream.handle)

        for out in self.outputs:
            cuda.memcpy_dtoh_async(out.host, out.device, self.stream)

        self.stream.synchronize()
        self.cfx.pop()
        return [out.host for out in self.outputs]

    def __del__(self):
        del self.engine
        del self.context
        del self.runtime