Description
Hi,
I’m using TensorRT to do object detection(SSD). The engine runs well, but when I move the trt_output to GPU, i got Cuda Runtime (invalid resource handle).
Environment
TensorRT Version: 8.0.0.3
GPU Type: T4
Nvidia Driver Version: 450
CUDA Version: 11.0
CUDNN Version: 8.2.0
Operating System + Version: Centos7
Python Version (if applicable): 3.7
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 1.8
Baremetal or Container (if container which image + tag):
Relevant Files
Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)
Steps To Reproduce
Here is my code.
class TRTInference:
def __init__(self, engine_file_path):
self.engine_file_path = engine_file_path
self.engine = self.get_engine()
self.context = self.get_context()
self.inputs, self.outputs, self.bindings, self.stream = self.allocate_buffers()
def get_engine(self):
trt.init_libnvinfer_plugins(None, '')
with open(self.engine_file_path, "rb") as f:
return runtime.deserialize_cuda_engine(f.read())
def get_context(self):
return self.engine.create_execution_context()
def allocate_buffers(self):
inputs = []
outputs = []
bindings = []
stream = cuda.Stream()
for binding in self.engine:
if self.engine.binding_is_input(binding):
size = self.input_shape()
else:
size = self.output_shape(binding)
dtype = trt.nptype(self.engine.get_binding_dtype(binding))
# Allocate host and device buffers
host_mem = cuda.pagelocked_empty(size, dtype)
device_mem = cuda.mem_alloc(host_mem.nbytes)
# Append the device buffer to device bindings.
bindings.append(int(device_mem))
# Append to the appropriate list.
if self.engine.binding_is_input(binding):
inputs.append(HostDeviceMem(host_mem, device_mem))
else:
outputs.append(HostDeviceMem(host_mem, device_mem))
return inputs, outputs, bindings, stream
def do_inference(self, img, img_h, img_w, batch_size):
self.inputs[0].host = img
self.context.active_optimization_profile = 0
self.context.set_binding_shape(0, (batch_size, img_h, img_w, 3))
# Transfer data from CPU to the GPU.
[cuda.memcpy_htod_async(inp.device, inp.host, self.stream) for inp in self.inputs]
# Run inference.
self.context.execute(batch_size=batch_size, bindings=self.bindings)
# Transfer predictions back from the GPU.
[cuda.memcpy_dtoh_async(out.host, out.device, self.stream) for out in self.outputs]
# Return only the host outputs.
trt_outputs = [out.host for out in self.outputs]
return trt_outputs
Now, the type of trt_outputs is numpy array.
And then I do trt_outputs[0] = torch.from_numpy(trt_outputs[0]).cuda()
, the error appears.
I believe this is something wrong related to the cuda context. When I call .cuda()
, a new cuda context is initialized inside Pytorch. When TensorRT starts to do inference , it will use the wrong cuda context. How to deal with this issue?
Please include:
Traceback:
[TensorRT] ERROR: 1: [convolutionRunner.cpp::checkCaskExecError<false>::440] Error Code 1: Cask (Cask Convolution execution)
[TensorRT] ERROR: 1: [apiCheck.cpp::apiCatchCudaError::17] Error Code 1: Cuda Runtime (invalid resource handle)