[TensorRT] ERROR: …/rtSafe/cuda/reformat.cu (925) - Cuda Error in NCHWToNCHHW2: 400 (invalid resource handle)

I have converted YOLOV3 model to onnx and from onnx to TRT model .

Now I am trying to run inference on received image using socketio . Here is my code:

import os
import time
import argparse
import numpy as np
import cv2
import pycuda.autoinit # This is needed for initializing CUDA driver
import socketio
import base64
from utils.yolo_classes import get_cls_dict
from utils.camera import add_camera_args, Camera
from utils.display import open_window, set_display, show_fps
from utils.visualization import BBoxVisualization
from utils.yolo_with_plugins import TrtYOLO

global conf_th

conf_th = 0.3
‘’’
Loading Model
‘’’

global trt_yolo

trt_yolo = TrtYOLO(“yolov3-custom-416”, (416, 416), 3)

print (“trt_yolo ==>”, trt_yolo )

WINDOW_NAME = ‘TrtYOLODemo’

inputShape = (300,300)

‘’’
Shinobi Plugin Variables
‘’’
shinobiPLuginName = “NoMask”
shinobiPluginKey = “NoMask123123”
shinobiHost = ‘http://192.168.0.109:9090

‘’’
Socker IO Connection with Reconnection
‘’’
sio = socketio.Client(reconnection=True,reconnection_delay=1,ssl_verify = False)
sio.connect(shinobiHost,transports=‘websocket’)
sio.emit(‘ocv’,
{‘f’:‘init’,‘plug’:shinobiPLuginName,‘type’:‘detector’,‘connectionType’:‘websocket’,‘pluginKey’:shinobiPluginKey})

#Socket IO Connection Event , Built in Reconneciton Logic
@sio.event
def connect():
print(‘connection established :’)
sio.emit(‘ocv’,
{‘f’:‘init’,‘plug’:shinobiPLuginName,‘type’:‘detector’,‘connectionType’:‘websocket’,‘pluginKey’:shinobiPluginKey})

#Socket IO Re Connection Event
@sio.event
def reconnect():
print (“Reconnection established :”)
sio.emit(‘ocv’,
{‘f’:‘init’,‘plug’:shinobiPLuginName,‘type’:‘detector’,‘connectionType’:‘websocket’,‘pluginKey’:shinobiPluginKey})

#Socket IO Disconnect Event
@sio.event
def disconnect():
print(‘disconnected from server’)

def yolo_detection(img_np,trt_yolo,recvdImg,height, width,shinobiId,shonibiKe):
frame = img_np
trt_yolo = trt_yolo
print (“trt_yolo_YOLODETECTION”, trt_yolo)
(h, w) = frame.shape[:2]
#shinobiIdSend = sId
#shonibiKeSend = ske
recvdImg = recvdImg
boxes, confs, clss = trt_yolo.detect(frame, conf_th)
print ("boxes ", boxes)
print (“confs”, confs)
print (“clss” , clss)

#f event ! , Frame will be recived in this fucntion
@sio.event
def f(data):
# print (“on_f”)
#print (“Data”,data)
# print (“type(data)”,type(data))
# print (“data[ke]”,data.get(“ke”))
# print (“data[f]”,data.get(“f”))
# print (“data[id]”,data.get(“id”))
shinobiId = data.get(“id”)
shonibiKe = data.get(“ke”)
#print (“data[frame]”,data.get(“frame”))
recvdImg = data.get(“frame”)
#print (“Type of Image”, type(recvdImg))
#print (“Length of Image”,len(recvdImg))
nparr = np.fromstring(recvdImg, np.uint8)
print (“trt_yolo ON F!! ==>”, trt_yolo )
img_np = cv2.imdecode(nparr, cv2.IMREAD_COLOR)
#img_np = cv2.resize(img_np,inputShape,interpolation = cv2.INTER_AREA)
#print (“Image Recieved !!!”)
#cv2.imwrite(‘recvdImg.jpg’,img_np)
yolo_detection(img_np,trt_yolo,recvdImg,img_np.shape[0],img_np.shape[1],shinobiId,shonibiKe)

It will wait for socket io events!!

sio.wait()

Hi,

Could you share a complete error log with us first?
Thanks.

Hi @AastaLLL :

Here is the Error Log !

Blockquote[TensorRT] WARNING: Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.

Blockquote[TensorRT] ERROR: …/rtSafe/cuda/reformat.cu (925) - Cuda Error in NCHWToNCHHW2: 400 (invalid resource handle)

Blockquote [TensorRT] ERROR: FAILED_EXECUTION: std::exception

Hi,

Based on the log, do you generate the TensorRT plan file from the same platform and the same software version?
Please note that the TensoRT engine is not portable. You will need to generate the file from the same environment.

Thanks.

HI @AastaLLL,

Yes I am using the same platform and the same environment. [JETSON NANO]
After conversion of the model I have checked it with local images. And the model seems to be working OKAY.

The issue I am facing is with the socketio connection in combination with model.

If you see the code which I have shared , I am initially loading the model and every time I recieve the frame from websocket then I am trying to do inference on the received image . At that point of time I am getting this error.

If i run on local image and / or RTSP feeds the model is working OKAY.

Is there any mistake I am doing while loading the model in the above code ?trt_mask_plugin.txt (3.1 KB)

I have attached my python code file.

Here is the complete details of board:

NVIDIA Jetson Nano (Developer Kit Version)
L4T 32.4.4 [ JetPack UNKNOWN ]
Ubuntu 18.04.5 LTS
Kernel Version: 4.9.140-tegra
CUDA 10.2.89
CUDA Architecture: 5.3
OpenCV version: 4.1.1
OpenCV Cuda: YES
CUDNN: 8.0.0.180
TensorRT: 7.1.3.0
Vision Works: 1.6.0.501
VPI: 0.4.4

hi @AastaLLL ,

If I am not wrong I am getting this error because I am not handling async events . Can you please help me with how can I handle async received images and do inference on them ?

Hi,

Not sure if I understand your problem correctly.
It seems that you try to run the inference as a kind of callback function from the internet.

Then, a common error is that the CUDA context is refreshed and mixed up with other applications.
Please store the CUDA context before leaving the yolo_detection function and restore it when back.

A similar example can be found in this topic:

Thanks.

Hey @AastaLLL, I have a similar problem. Could you please take a look and let me know if the CUDA context is the thing causing an issue.

So, the original workflow was as per this repository:

  • Create TensorRT backends for YOLOv4 and a feature extraction model
  1. Use asynchronous processing for object detection on the Jetson Nano
  2. Use async processing for getting feature embeddings for each detection on the Jetson Nano
  3. Carrying out object tracking

I tried to modify the code to create a new workflow incorporating Google Coral. My aim was to run the detection on the Coral using TFlite and the feature extraction on the Nano using TensorRT:

  • Allocate buffers for the TFLite interpreter
  • Create a TensorRT backend for the feature extraction model
  • Use synchronous processing to infer detections using TFLite
  • Use async processing for getting feature embeddings for each detection on the Jetson Nano
  • Carrying out object tracking

The original workflow was working perfectly, but in the new workflow, I am facing the CUDA error mentioned in this thread. Is this issue due to the context being refreshed?

I did try using the CUDA context push and pop functions to make it work. The issue I am facing is that the Jetson Nano becomes unresponsive whenever I try to push the CUDA context. I have no idea why this occurs. Do you have any tips for this?