I am following this tutorial for speeding up object detection inference on my nvidia jetson nano.
I have done the following steps:
-
Converted .onnx model to .plan (engine file) and saved to disk.
-
loaded the engine using the following code.
with open(plan_path, ‘rb’) as f:
engine_data = f.read()
engine = trt_runtime.deserialize_cuda_engine(engine_data)
- Allocated buffers in host (memory) and device (GPU) for input data and output data.
h_input_1 = cuda.pagelocked_empty(batch_size *
trt.volume(engine.get_binding_shape(0)), dtype=trt.nptype($print(‘size hinput’, h_input_1.size)
h_output = cuda.pagelocked_empty(batch_size *
trt.volume(engine.get_binding_shape(1)), dtype=trt.nptype($
print(trt.volume(engine.get_binding_shape(1)), "trt.volume(engine.get_bindin$
print(‘size h_output’, h_output.size)d_input_1 = cuda.mem_alloc(h_input_1.nbytes)
print( h_input_1.nbytes)
print(d_input_1)d_output = cuda.mem_alloc(h_output.nbytes)
print(d_output, h_output.nbytes)
stream = cuda.Stream()
return h_input_1, d_input_1, h_output, d_output, stream
- Copy input data (image) from host memory to Device memory using cuda.memcpy_htod_async(d_input_1, h_input_1, stream) which is successfully completed.
- Run the object detection network on loaded input data
- Move output data from device memory to host memory using cuda.memcpy_dtoh_async (h_output, d_output, stream).
I am facing issue in step 6 which has been shown in the following image.
Moreover, I can successfully run the inference using detectnet executable code using the following command line:
detectnet --model=models/person/ssd-mobilenet.onnx --labels=models/person/labels.txt
–input-blob=input_0 --output-cvg=scores --output-bbox=boxes
“$IMAGES/person/000001.jpg” $IMAGES/test/person/000001.jpg
I believe (not sure) it creates a runtime_engine to do the inference. Also i could see that it is using GPU when it runs (using jtop).
I have the following questions:
- What and why is the Cuda error and how to solve it?
- I have shown above that I can use detectnet command to run inference. Which one should be preferred for real-time application and correct way of using it?
- What is the differnce between the two ways?