Cuda Error in launchPwgenKernel- When running a specific engine in async

weissrael · July 9, 2020, 10:13am

Description

When I run a specific engine file (YOLO v3) in python asynchronously using streams and threads, I get the following error when starting a thread:
ERROR: ../rtExt/cuda/pointwiseV2Helpers.h (538) - Cuda Error in launchPwgenKernel: 400 (invalid resource handle)
[TensorRT] ERROR: FAILED_EXECUTION: std::exception

It’s a C++ error that does not crash the run, but obviously it doesn’t apply inference in this case.
When running the same code asynchronously with a different model this runs without errors at all. So it seems to be related to a specific node that crashes when running asynchronously.

Environment

TensorRT Version: 7.1.2
GPU Type:
Nvidia Driver Version:
CUDA Version: 11.0
Operating System + Version: Ubuntu 18.04
Python Version (if applicable): 3.6
TensorFlow Version (if applicable): The model was trained on tf 1.15, converted to onnx, and then converted to tensorRT engine.

Notes:

It applies inference correctly on the onnx model
I can apply inference on this engine file when I run the code in non-async mode
I can run the same code asynchronously on some other engine file (standard imagenet classificatio model, for example)
The engine file is of a YOLO v3 model
When converting the model from onnx to trt engine, I can see “PointWiseV2” mentioned several times, i.e. “>>>>>>>>>>>>>>> Chose Runner Type: PointWiseV2 Tactic: 9”, so this hints again the specific node that might cause it.

I can’t share the model, but this is the general logic for the multithreading:

class myThread(Thread):

def __init__(self, func, args):
  Thread.__init__(self)
  self.func = func
  self.args = args
   
def run(self):
  print ("Starting " + self.args[0])
  self.func(*self.args)
  print ("Exiting " + self.args[0])

class TRTInference:

def __init__(self, engine, trt_engine_datatype, batch_size, num_classes, N_run):
    self.cfx = cuda.Device(0).make_context()
    stream = cuda.Stream()

    trt.init_libnvinfer_plugins(TRT_LOGGER, '')

    context = engine.create_execution_context()

    # prepare buffer
    host_inputs  = []
    cuda_inputs  = []
    host_outputs = []
    cuda_outputs = []
    bindings = []

    for binding in engine:
        size = trt.volume(engine.get_binding_shape(binding)) * engine.max_batch_size
        host_mem = cuda.pagelocked_empty(size, np.float32)
        cuda_mem = cuda.mem_alloc(host_mem.nbytes)

        bindings.append(int(cuda_mem))
        if engine.binding_is_input(binding):
            host_inputs.append(host_mem)
            cuda_inputs.append(cuda_mem)
        else:
            host_outputs.append(host_mem)
            cuda_outputs.append(cuda_mem)

    # store
    self.stream  = stream
    self.context = context
    self.engine  = engine
    self.batch_size = batch_size
    self.num_classes  = num_classes

    self.host_inputs = host_inputs
    self.cuda_inputs = cuda_inputs
    self.host_outputs = host_outputs
    self.cuda_outputs = cuda_outputs
    self.bindings = bindings

    self.preprocess_func = preprocessing.preprocess_detector_yolo
    self.N_run = N_run

    trt_engine_path = args.engine 
    max_batch_size = args.batch_size
    # deserialize engine
    TRT_LOGGER = trt.Logger(trt.Logger.INFO)
    runtime = trt.Runtime(TRT_LOGGER)
    with open(trt_engine_path, 'rb') as f:
        buf = f.read()
        engine = runtime.deserialize_cuda_engine(buf)
    trt_inference_wrapper = TRTInference(engine,
                                         trt_engine_datatype=trt.DataType.FLOAT,
                                         batch_size=max_batch_size, num_classes=args.num_classes,
                                         N_run=args.n_run)

    # assign a thread for each image
    threads_list = []
    # but first apply warmup inference without a thread
    n_warmup = args.n_warmup
    for path_id, input_img_path in enumerate(filenames):

        cur_thread = myThread(trt_inference_wrapper.infer_async, [input_img_path])
        threads_list.append(cur_thread)
        cur_thread.start()

The error occurs when I call the cur_thread.start()
Would appreciate any help why this might occur on the scenario I described above :)

AakankshaS · July 9, 2020, 7:19pm

Hi @weissrael,
Can you please help us with verbose logs to assist you better.
Thanks!

weissrael · July 12, 2020, 9:54am

I attach here the verbose logs for the inference and for the converter (onnx → engine file). The errors at the end of the inference logs are repeated several times- each time a thread has started, so I included only the first time it happens.
verbose_logs_converter_detector.txt (416.7 KB) verbose_logs_inference_detector.txt (1.8 KB)

AakankshaS · July 13, 2020, 6:32am

Hi @weissrael,
The issue might be due to mismatch of environment.
Can you please try the inference in the environment you used to build the engine?
If it works in the same environment , you probably needs to update driver in the failing environment.
Thanks!

weissrael · July 13, 2020, 6:41am

Hi,
I apply inference in the same environment that I built the engine- on this engine it applies inference without errors only on synchronous mode; on asynchronous (multi streams and threads) this inference fails… Is it still related to drivers in this case?

AakankshaS · July 13, 2020, 6:58am

Hi @weissrael
Can you try your onnx model with trtexec to check if the issue persist?

Thanks!

weissrael · July 13, 2020, 11:16am

I tried running it with trtexec and the --threads flag, and it applies inference successfully with trtexec. Hmmm…

AakankshaS · July 13, 2020, 11:45am

So looks like the issue is with your script.

invalid resource handle is probably the Cuda Stream.
Cuda Stream and Cuda Pointers are bound to Cuda context. we can’t use one stream/memory created using one Cuda context with another Cuda context.
If we do, that exact error happens.
For synchronous API, the default nullptr stream is used and that nullptr stream works on any cuda context.

rdpdo2002 · June 11, 2022, 7:33am

Hello, I got exactly the same error have you found a solution please ? Thanks !c

rdpdo2002 · June 11, 2022, 7:34am

But why is there two cuda since only once is created in the script ?

Topic		Replies	Views
TensorRT ERROR: pointWiseV2Helpers.h::launchPwgenKernel::532 Cuda Driver (invalid resource handle) Jetson Xavier NX tensorrt , cuda , jetson-inference	3	2144	March 24, 2022
Please Help : TensorRT with Thread ERROR : 1: [pointWiseV2Helpers.h::launchPwgenKernel::532] Error Code 1: Cuda Driver (invalid resource handle) Jetson Nano tensorrt	4	1074	June 27, 2022
TensorRT do_inference error TensorRT	19	8664	November 14, 2022
[TensorRT] engine happed a error in multithreaded TensorRT tensorrt , cuda	2	1661	January 19, 2023
Adding multiple inference on TensorRT (Invalid Resource Handle Error) TensorRT	2	1771	December 4, 2019
Tensorrt threading error TensorRT	1	543	July 19, 2022
Multiple threads execution with different engines in tensorrt TensorRT tensorrt	3	2694	December 13, 2022
CUDA error: unspecified launch failure TensorRT	1	1562	October 25, 2021
[genericReformat.cuh::copyPackedRunKernel::1487] Error Code 1: Cuda Runtime (invalid resource handle) TensorRT	0	186	August 19, 2024
"Cuda Error in NCHWTONCHHW2: 33 (invalid resource handle) "，How to solve it? Jetson Nano cuda	30	6633	October 18, 2021

Cuda Error in launchPwgenKernel- When running a specific engine in async

Description

Environment

Related topics