Trying to run a tensorrt face detect model on jetson nano 4 gb

Using tensorrt 8.2.19 on jetson nano 4 gb
I converted the model face detect from nvidia to a tensorrt model (.engine)
face_detector1.py (8.6 KB)
i am trying to get the output from it but always i get that error
ValueError: could not broadcast input array from shape (150528) into shape (276480)
[03/13/2023-03:39:36][TRT] [E] 1: [defaultAllocator.cpp::deallocate::35] Error
Segmentation fault (core dumped)

Hi,

We have a sample to run a serialized engine in python.
Please check if your model can work with the sample or not first.

https://elinux.org/Jetson/L4T/TRT_Customized_Example#OpenCV_with_PLAN_model

Thanks.

Hello
I tried the sample with the facedetect model (.engine)
i have jetpack 4.6 , OS 18.04 , python 3.6 ,tensorrt 8.2.19
Always i got that error
[03/14/2023-01:33:13] [TRT] [I] [MemUsageChange] Init CUDA: CPU +224, GPU +0, now: CPU 265, GPU 3211 (MiB)
[03/14/2023-01:33:13] [TRT] [I] Loaded engine size: 9 MiB
[03/14/2023-01:33:13] [TRT] [W] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[03/14/2023-01:33:14] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +158, GPU +164, now: CPU 423, GPU 3375 (MiB)
[03/14/2023-01:33:16] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +241, GPU +168, now: CPU 664, GPU 3543 (MiB)
[03/14/2023-01:33:16] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +9, now: CPU 0, GPU 9 (MiB)
Traceback (most recent call last):
File “nvidia_test.py”, line 88, in
engine = PrepareEngine()
File “nvidia_test.py”, line 73, in PrepareEngine
size = trt.volume(engine.get_tensor_shape(binding)) * batch
AttributeError: ‘tensorrt.tensorrt.ICudaEngine’ object has no attribute ‘get_tensor_shape’
[03/14/2023-01:33:18] [TRT] [E] 1: [defaultAllocator.cpp::deallocate::35] Error Code 1: Cuda Runtime (invalid argument)
Segmentation fault (core dumped)

Hi,
I tried running this code but i got this error

[03/14/2023-01:33:13] [TRT] [I] [MemUsageChange] Init CUDA: CPU +224, GPU +0, now: CPU 265, GPU 3211 (MiB)
[03/14/2023-01:33:13] [TRT] [I] Loaded engine size: 9 MiB
[03/14/2023-01:33:13] [TRT] [W] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[03/14/2023-01:33:14] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +158, GPU +164, now: CPU 423, GPU 3375 (MiB)
[03/14/2023-01:33:16] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +241, GPU +168, now: CPU 664, GPU 3543 (MiB)
[03/14/2023-01:33:16] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +9, now: CPU 0, GPU 9 (MiB)
Traceback (most recent call last):
  File "nvidia_test.py", line 88, in <module>
    engine = PrepareEngine()
  File "nvidia_test.py", line 73, in PrepareEngine
    size = trt.volume(engine.get_tensor_shape(binding)) * batch
AttributeError: 'tensorrt.tensorrt.ICudaEngine' object has no attribute 'get_tensor_shape'
[03/14/2023-01:33:18] [TRT] [E] 1: [defaultAllocator.cpp::deallocate::35] Error Code 1: Cuda Runtime (invalid argument)
Segmentation fault (core dumped)

Hi,

The script is for TensorRT 8.4 API.

For TensorRT 8.2 library, please apply the following change:

diff --git a/infer.py b/infer.py
index bf8657c..3281a60 100644
--- a/infer.py
+++ b/infer.py
@@ -44,12 +44,12 @@ def PrepareEngine():
 
     # create buffer
     for binding in engine:
-        size = trt.volume(engine.get_tensor_shape(binding)) * batch
+        size = trt.volume(engine.get_binding_shape(binding)) * batch
         host_mem = cuda.pagelocked_empty(shape=[size],dtype=np.float32)
         cuda_mem = cuda.mem_alloc(host_mem.nbytes)
 
         bindings.append(int(cuda_mem))
-        if engine.get_tensor_mode(binding)==trt.TensorIOMode.INPUT:
+        if engine.binding_is_input(binding):
             host_inputs.append(host_mem)
             cuda_inputs.append(cuda_mem)
         else:

Thanks.

Thanks astaLLL you were very helpful
but i got another problem in the Inference function
astalll.py (1.7 KB)

[03/23/2023-01:37:48] [TRT] [I] [MemUsageChange] Init CUDA: CPU +224, GPU +0, now: CPU 265, GPU 2661 (MiB)
[03/23/2023-01:37:48] [TRT] [I] Loaded engine size: 9 MiB
[03/23/2023-01:37:48] [TRT] [W] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[03/23/2023-01:37:49] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +158, GPU +157, now: CPU 423, GPU 2818 (MiB)
[03/23/2023-01:37:50] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +241, GPU +241, now: CPU 664, GPU 3059 (MiB)
[03/23/2023-01:37:50] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +9, now: CPU 0, GPU 9 (MiB)
Traceback (most recent call last):
  File "astalll.py", line 64, in <module>
    Inference(engine)
  File "astalll.py", line 23, in Inference
    np.copyto(host_inputs[0], image.ravel())
  File "<__array_function__ internals>", line 6, in copyto
ValueError: could not broadcast input array from shape (196608) into shape (276480)
[03/23/2023-01:37:53] [TRT] [E] 1: [defaultAllocator.cpp::deallocate::35] Error Code 1: Cuda Runtime (invalid argument)
Segmentation fault (core dumped)

Hi,

Based on the log:

\np.copyto(host_inputs[0], image.ravel())
...
ValueError: could not broadcast input array from shape (196608) into shape (276480)

The input image is not the same size as the model input.
You can use OpenCV to rescale the image to align the model size.

Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.