Cuda performance degradation from Xavier to Orin

Hi,
I have a c++ application that heavily relies on cuda for GPU acceleration running on the jetson Xavier nx with cuda 10.2. I recently upgraded the board to the Orin nx with cuda 11.4 and noticed that compiling with the native CMAKE Cuda architectures for both that the performance on the Orin significantly declined. I’m looking into using some of the profiler tools to see what’s going on with the gpu usage and runtime differences between the two boards but I’m hoping someone may have an idea off the top of their head why there might be a performance degradation as I upgrade to a newer and more powerful gpu?

Questions about NVIDIA’s embedded products receive better/faster/more numerous answers in the sub-forums dedicate to them:

A poster in this forum recently reported a significant performance regression when moving from Xavier to Orin. Other than a larger L2 of the host CPU in Xavier, all hardware characteristics would suggest that Orin is the faster platform, so it is not clear what may be going on, as the difference in CPU L2 size seemed unlikely to explain the magnitude of the performance difference observed by that poster.

Expertise for these platforms is definitely concentrated in the dedicated sub-forums.