I developed a code in which some part is being executed on CPU and some part on GPU. Initially I tested it on 32GB Jetson Agx Xavier Board on which execution time of the part running on CPU was around 150 microseconds. When I tested the same code on 64GB Jetson Agx Orion board, the part executing on CPU is taking around 500 microseconds. I want to know why this is happening while Orion board has better clock speed. Is there any solution to this as i need execution time to be less.
I have already executed command
sudo nvpmodel -m 0
The part of code executing on GPU is giving great performance on Orion board
I suggest asking this question on the relevant Jetson forum.
Could be something as 8 MB L2 vs. 3 MB L2. But as you wrote, not GPU related.
Possible but seems unlikely? Other than for this one metric, the “speeds & feeds” for both CPUs suggest that Orin should be faster than Xavier.
The size of the performance difference suggests a debug build versus a release build. Other than checking build settings, profiling should reveal the root cause of the performance difference.
Thanks for replying.
I am executing code on both the system with same simple command as-
nvcc Filename.cu -oTestOutput.
I will try to do profiling
You could try providing optimization flags directly to the host compiler with -Xcompiler
. For example, for Orin it might be -Xcompiler "-O3 -march=armv8.2-a -mtune=cortex-a78ae"
Please note that experience of participants in this sub-forum with Jetson platforms tends to be limited, and you will likely get much better and more relevant answers by asking in the Jetson forum already linked by @Robert_Crovella above.
Thanks!!
I will try to set optimization flags.
I have Posted this issue on Jetson forum.