I tested SSD MobileNet V2 on jetson nano using TensorRT following this repo https://github.com/AastaNV/TRT_object_detection The inference speed was fine. However, when I increased the batch size to greater than 1, the inference time was always multiples of the original and appeared as if nothing got running in parallel. I even increased the builder.maxWorkSpace to 1 GB when building the TensorRT engine and it still didn’t work. I was under the impression that increasing the batch up to 32 will have almost no impact on runtime? Is it because Jetson Nano has too little memory or too few cores and can only do detection for a batch size of 1?
It’s recommended to monitor the system status with tegrastats at the same time.
$ sudo tegrastats
The parallelism of batchsize is limited by the GPU resource.
If the GPU utilization already reach to 99% with batchsize=1, it’s won’t have good acceleration with batchsize over than 1.