Description
When I run a specific engine file (YOLO v3) in python asynchronously using streams and threads, I get the following error when starting a thread:
ERROR: ../rtExt/cuda/pointwiseV2Helpers.h (538) - Cuda Error in launchPwgenKernel: 400 (invalid resource handle)
[TensorRT] ERROR: FAILED_EXECUTION: std::exception
It’s a C++ error that does not crash the run, but obviously it doesn’t apply inference in this case.
When running the same code asynchronously with a different model this runs without errors at all. So it seems to be related to a specific node that crashes when running asynchronously.
Environment
TensorRT Version: 7.1.2
GPU Type:
Nvidia Driver Version:
CUDA Version: 11.0
Operating System + Version: Ubuntu 18.04
Python Version (if applicable): 3.6
TensorFlow Version (if applicable): The model was trained on tf 1.15, converted to onnx, and then converted to tensorRT engine.
Notes:
- It applies inference correctly on the onnx model
- I can apply inference on this engine file when I run the code in non-async mode
- I can run the same code asynchronously on some other engine file (standard imagenet classificatio model, for example)
- The engine file is of a YOLO v3 model
- When converting the model from onnx to trt engine, I can see “PointWiseV2” mentioned several times, i.e. “>>>>>>>>>>>>>>> Chose Runner Type: PointWiseV2 Tactic: 9”, so this hints again the specific node that might cause it.
I can’t share the model, but this is the general logic for the multithreading:
class myThread(Thread):
def __init__(self, func, args):
Thread.__init__(self)
self.func = func
self.args = args
def run(self):
print ("Starting " + self.args[0])
self.func(*self.args)
print ("Exiting " + self.args[0])
class TRTInference:
def __init__(self, engine, trt_engine_datatype, batch_size, num_classes, N_run):
self.cfx = cuda.Device(0).make_context()
stream = cuda.Stream()
trt.init_libnvinfer_plugins(TRT_LOGGER, '')
context = engine.create_execution_context()
# prepare buffer
host_inputs = []
cuda_inputs = []
host_outputs = []
cuda_outputs = []
bindings = []
for binding in engine:
size = trt.volume(engine.get_binding_shape(binding)) * engine.max_batch_size
host_mem = cuda.pagelocked_empty(size, np.float32)
cuda_mem = cuda.mem_alloc(host_mem.nbytes)
bindings.append(int(cuda_mem))
if engine.binding_is_input(binding):
host_inputs.append(host_mem)
cuda_inputs.append(cuda_mem)
else:
host_outputs.append(host_mem)
cuda_outputs.append(cuda_mem)
# store
self.stream = stream
self.context = context
self.engine = engine
self.batch_size = batch_size
self.num_classes = num_classes
self.host_inputs = host_inputs
self.cuda_inputs = cuda_inputs
self.host_outputs = host_outputs
self.cuda_outputs = cuda_outputs
self.bindings = bindings
self.preprocess_func = preprocessing.preprocess_detector_yolo
self.N_run = N_run
trt_engine_path = args.engine
max_batch_size = args.batch_size
# deserialize engine
TRT_LOGGER = trt.Logger(trt.Logger.INFO)
runtime = trt.Runtime(TRT_LOGGER)
with open(trt_engine_path, 'rb') as f:
buf = f.read()
engine = runtime.deserialize_cuda_engine(buf)
trt_inference_wrapper = TRTInference(engine,
trt_engine_datatype=trt.DataType.FLOAT,
batch_size=max_batch_size, num_classes=args.num_classes,
N_run=args.n_run)
# assign a thread for each image
threads_list = []
# but first apply warmup inference without a thread
n_warmup = args.n_warmup
for path_id, input_img_path in enumerate(filenames):
cur_thread = myThread(trt_inference_wrapper.infer_async, [input_img_path])
threads_list.append(cur_thread)
cur_thread.start()
The error occurs when I call the cur_thread.start()
Would appreciate any help why this might occur on the scenario I described above :)