We deploy the same solution to different HW units, same settings, same conditions.
And what we observe is that Orin Nano 8GB performs much better than Orin NX 16 GB .
In case of latter RAM is not a bottleneck, CPU is not as well. We compared modes, where stream frames are sent to AI model. What we get is that while Orin can analyze 4 Full HD streams with 5 FPS each, NX covers 2 cameras with 5.8 FPS.
Other modes, where CPU does preprocessing and then sends to AI – NX also fails.
NX covers 5.8 AI requests per second, while Nano does 11. I would expect the opposite.
We use TensorRT and CUDA, but those are our proprietary codes (models are convolution based without attention or similar).
Any hint or guidance is most welcome.
P.S. Previously we had similar malfunctioning issue with NX which apparently was using only 2 out of 6 cores of CPU. We had to force to “unlock” remaining 4 CPUs to use full capacity of the device. But here CPU cores are all active.