My Jetson Nano CPU is out performing my GPU by miles! What is point of Jetson Nano?

So I have this image detection algorithm, yolov3-tiny to be exact, and I’m running inference with it right. So, I first run inference on my GPU and I get crazy slow times like 3.1 seconds and 11 seconds! At first, I think maybe the algorithm is just really slow, then I decided to run the same neural net on the CPU. The results are staggering with the CPU averaging an inference time of 0.2 seconds! Now, I’m confused. This is a highly parallel network and is supposed to perform extraordinarily well on the Jetson Nano. I mean that is why I bought the Nano over the Pi, because of its GPU. But now, I see these shocking results and I am very disappointed. Maybe not doing this right and there is something I need to do before properly using the Jetson Nano’s GPU. But if everything is working as it is supposed to then I am ultimately very confused. How can I possibly increase inference speed on Jetson Nano GPU?

In this post it is mentioned that yolov3-tiny could reach 17-18 FPS on Jetson Nano:

There is also this post which mentions 20 FPS with YOLOv3 on Jetson Nano: