"Cuda Error in NCHWTONCHHW2: 33 (invalid resource handle) ",How to solve it?

What’s the problem?


cudaErrorInvalidResourceHandle = 400
This indicates that a resource handle passed to the API call was not valid. Resource handles are opaque types like cudaStream_t and cudaEvent_t.

The error indicates that you may use the invalid CUDA stream or CUDA event.
May I know how to you meet this error?


I need to call the engine of tensorRT many times. At present, my code only loads the engine of tensorRT when it is started for the first time, and then it will be stored in a variable, and then the error will be reported in the second reasoning. What should I do?

The code looks something like this:
“If the index = = 0:
Print (’ first allocate_buffers’)
Inputs, outputs, bindings, stream = common.allocate_buffers(engine)”

The first time when reasoning is normal, the second time can report wrong, how do I do excuse me?

1 Like

Incidentally, I also loaded another model (a trace model of keras). Could it be an error caused by insufficient memory, and if so, how to solve it?


You can reuse these buffer instead of allocating new one for each inference:

Inputs, outputs, bindings, stream = common.allocate_buffers(engine)

Could you give it a try?


So that’s what I did, the first time I foreach it was allocate_buffers, and then I got my error up there


We want to reproduce this issue in our environment to check it further.
Would you mind to share a simple reproducible script with us?


I loaded a trace model (pbfile) and a detection model (TRT). When the detection model reasoning is run for the second time, an error “cuda error” will be reported. The code is as follows:

model_filename = 'model_data/mars-small128.pb'

        metric = nn_matching.NearestNeighborDistanceMetric("cosine", max_cosine_distance, nn_budget)
        tracker = Tracker(metric)

        conn = redis.Redis(host='', port=6379, decode_responses=True)

        with self.get_engine(onnx_file_path, batch_size, fp16_on,
                             engine_file_path) as engine, engine.create_execution_context() as context:
            while True:
                    # if conn.get('time_interval') is not None:
                    #     continue

                    t1 = time.time()
                    print('start_time:' + str(time.time()))
                    # frame[:, :, [0, 1, 2]] = frame[:, :, [2, 1, 0]]
                    global current_frame
                    frame = copy.deepcopy(current_frame)
                    if frame is not None:
                        image = Image.fromarray(frame)
                        # print(time.time())
                        if index == 0:
                            b = BytesIO()
                            inputs, outputs, bindings, stream = common.allocate_buffers(engine)
                        # image.save(b, format="jpeg")

                        # print(time.time())

                        images = []
                        images_raw = []

                        image_raw, image = preprocessor.process(image)
                        # print(182)
                        # print(time.time())


                        index += 1
                        # if index != nums and len(images_raw) != batch_size:
                        #     continue
                        images_batch = np.concatenate(images, axis=0)
                        # print(time.time())

                        inputs[0].host = images_batch
                        # print(input_size)
                        # print(inputs)¨
                        # print(outputs)
                        print('common.do_inference start')
                        trt_outputs = common.do_inference(context, bindings=bindings, inputs=inputs, outputs=outputs,

I migrated and wrote based on this repository:
Qidian213/deep_sort_yolov3: Real-time Multi-person tracker using YOLO v3 and deep_sort with tensorflow

This question has been bothering me for a long time, and my general thinking is as follows:

  1. Solve the memory preemption problem based on pbfile and TRT?
  2. Convert the trace model’s pbfile to TRT?

I need your help. Thank you

I want to make it clear that the code works when the trace model (pbfile) is not loaded…

How to solve this problem?

Hello, how can I solve it?


Sorry for keeping you waiting.
Would you mind to share a complete source with us so we can reproduce this more easily?



Just for your reference:
We have another topic meeting the same “NCHWTONCHHW2: 33” error.

Their root cause is that CUDA context is closed by other frameworks when terminated (Tensorflow in their case).
Is there any possibility that the CUDA context in your app also be closed by other frameworks?


Can you give some specific practical advice? How can I use both TRT and pb models?
Is it controlling video memory, or is it necessary to convert pb to TRT? Thanks for your advice

I have read the issue you Shared, but his problem is a little different from mine, because he only runs a simple sess.run code, and I want to run the whole pb tracking model, now I haven’t solved this problem

This problem has not been effectively solved. How to deal with it?
Check out my other topic
"Cuda Error in NCHWTONCHHW2: 33 (invalid resource handle) ", How to solve it? - Jetson & Embedded Systems/Jetson Nano - NVIDIA Developer associations

My idea is that I plan to use CPU to execute another model, but I can’t find the tensorflow-cpu running method of jetson nano. Can you recommend the tutorial?


You can set this environment parameter to force TensorFlow run on CPU mode:


Here is a good tutorial for your reference:


I mean, I did not install the tensorflow CPU library of jetson nano, I need to install the tensorflow CPU library first, I searched and did not find the corresponding library, may I ask how to install it?