Cuda Runtime (invalid resource handle) when use TensorRT and Pytorch(on GPU) simultaneously


I’m using TensorRT to do object detection(SSD). The engine runs well, but when I move the trt_output to GPU, i got Cuda Runtime (invalid resource handle).


TensorRT Version:
GPU Type: T4
Nvidia Driver Version: 450
CUDA Version: 11.0
CUDNN Version: 8.2.0
Operating System + Version: Centos7
Python Version (if applicable): 3.7
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 1.8
Baremetal or Container (if container which image + tag):

Here is my code.

class TRTInference:
    def __init__(self, engine_file_path):
        self.engine_file_path = engine_file_path
        self.engine = self.get_engine()
        self.context = self.get_context()
        self.inputs, self.outputs, self.bindings, = self.allocate_buffers()

    def get_engine(self):
        trt.init_libnvinfer_plugins(None, '')
        with open(self.engine_file_path, "rb") as f:
            return runtime.deserialize_cuda_engine(

    def get_context(self):
        return self.engine.create_execution_context()

    def allocate_buffers(self):
        inputs = []
        outputs = []
        bindings = []
        stream = cuda.Stream()
        for binding in self.engine:
            if self.engine.binding_is_input(binding):
                size = self.input_shape()
                size = self.output_shape(binding)
            dtype = trt.nptype(self.engine.get_binding_dtype(binding))
            # Allocate host and device buffers
            host_mem = cuda.pagelocked_empty(size, dtype)
            device_mem = cuda.mem_alloc(host_mem.nbytes)
            # Append the device buffer to device bindings.
            # Append to the appropriate list.
            if self.engine.binding_is_input(binding):
                inputs.append(HostDeviceMem(host_mem, device_mem))
                outputs.append(HostDeviceMem(host_mem, device_mem))
        return inputs, outputs, bindings, stream

    def do_inference(self, img, img_h, img_w, batch_size):
        self.inputs[0].host = img
        self.context.active_optimization_profile = 0
        self.context.set_binding_shape(0, (batch_size, img_h, img_w, 3))
        # Transfer data from CPU to the GPU.
        [cuda.memcpy_htod_async(inp.device,, for inp in self.inputs]
        # Run inference.
        self.context.execute(batch_size=batch_size, bindings=self.bindings)
        # Transfer predictions back from the GPU.
        [cuda.memcpy_dtoh_async(, out.device, for out in self.outputs]
        # Return only the host outputs.
        trt_outputs = [ for out in self.outputs]
        return trt_outputs

Now, the type of trt_outputs is numpy array.
And then I do trt_outputs[0] = torch.from_numpy(trt_outputs[0]).cuda(), the error appears.
I believe this is something wrong related to the cuda context. When I call .cuda(), a new cuda context is initialized inside Pytorch. When TensorRT starts to do inference , it will use the wrong cuda context. How to deal with this issue?
[TensorRT] ERROR: 1: [convolutionRunner.cpp::checkCaskExecError<false>::440] Error Code 1: Cask (Cask Convolution execution)
[TensorRT] ERROR: 1: [apiCheck.cpp::apiCatchCudaError::17] Error Code 1: Cuda Runtime (invalid resource handle)


Could you please give more details, are you using pytorch and pycuda in the same module(together) ?

I have a similar problem with TensorFlow. I use a TensorRT model on GPU and then I want to process some results using a TensorFlow convolution. The first call to context.execute_async_v2 runs correctly, after which I run the convolution. Then for the second frame I call context.execute_async_v2 and it returns the same two errors the original post.

This problem does not occur when I do not use the convolution.

I thought that running the convolution on the CPU would fix it but then it returns another error:

[02/21/2022-14:48:07] [TRT] [E] 1: [context.cpp::setStream::121] Error Code 1: Cudnn (CUDNN_STATUS_MAPPING_ERROR)

Did find any solution please, because I am facing the same problem