Part of Code executing on CPU of Jetson Agx Orion board is taking more time compared to execution time of same code on Jetson Agx xavier board

I developed a code in which some part is being executed on CPU and some part on GPU. Initially I tested it on 32GB Jetson Agx Xavier Board on which execution time of the part running on CPU was around 150 microseconds. When I tested the same code on 64GB Jetson Agx Orion board, the part executing on CPU is taking around 500 microseconds. I want to know why this is happening while Orion board has better clock speed. Is there any solution to this as i need execution time to be less.

I have already executed command
sudo nvpmodel -m 0

The part of code executing on GPU is giving great performance on Orion board

I suggest asking this question on the relevant Jetson forum.

Could be something as 8 MB L2 vs. 3 MB L2. But as you wrote, not GPU related.

Possible but seems unlikely? Other than for this one metric, the “speeds & feeds” for both CPUs suggest that Orin should be faster than Xavier.

The size of the performance difference suggests a debug build versus a release build. Other than checking build settings, profiling should reveal the root cause of the performance difference.

Thanks for replying.
I am executing code on both the system with same simple command as-
nvcc Filename.cu -oTestOutput.

I will try to do profiling

You could try providing optimization flags directly to the host compiler with -Xcompiler. For example, for Orin it might be -Xcompiler "-O3 -march=armv8.2-a -mtune=cortex-a78ae"

Please note that experience of participants in this sub-forum with Jetson platforms tends to be limited, and you will likely get much better and more relevant answers by asking in the Jetson forum already linked by @Robert_Crovella above.

Thanks!!
I will try to set optimization flags.
I have Posted this issue on Jetson forum.