I have configured my Jetson Orin Nano with JetPack 5.1, CUDA 11.4, and I am running a script for object detection with YOLOv8. When compared to a similar setup on my notebook with a GTX 2070 graphics card, the performance difference is abysmal, in favor of the notebook, as seen in the video. Is such a difference to be expected? Both are running the same model converted from PyTorch to TensorRT with the configuration half=True, simplify=True, and both support CUDA. My question refers to the fact that, on NVIDIA’s official site, the GTX 2070 has a ‘compute capability’ of 7.5, while the Jetson Orin Nano has a compute capability of 8.7.
Thanks
There are many factors that could be at play here. Profiling to find bottlenecks or differences between the two test cases may be in order. compute capability, by itself, is almost certainly not that relevant, at least between 7.5 and 8.7. Orin Nano has 8 SMs, whereas RTX 2070 has 40. Orin Nano has 68GB/s memory bandwidth whereas RTX 2070 has 448GB/s. These comparisons are much more correlated to predicted performance than compute capability, and they strongly favor the RTX 2070. (there is also a good chance that your notebook CPU could run rings around the CPU on the Orin Nano, but whether or not it matters would depend on the bottlenecks. CPU perf could possibly be irrelevant.) For a given compute capability, NVIDIA can (and does) build both “large” and “small” GPUs. A large GPU of a lower numbered compute capability could easily outperform a “small” GPU of a higher numbered compute capability. Since you are asking about Jetson Orin Nano, you may get better assistance by asking on the Jetson Orin Nano sub-forum.