No speed-up on jetson xavier nx

hi, i modified a histogram median filtering image process algorithm( in cuda ) via changing the if-else branches into normal expressions, and get a 50%+ decrease in run time on my pc (gpu: rtx 2080 ti , compute capability 7.5)
but when i put this algo on jetson xavier nx( compute capability 7.2), there is no boost in run speed. who knows why?
any suggestion is welcome


Are you compared XavierNX with an RTX 2080Ti?

I compared their Technical Specifications in Programming Guide :: CUDA Toolkit Documentation
but find no great diffence.i dont know why there is no speed-up on nx.
by the way , nx is 2 times slower than 2080 ti. I wonder if the hardware of nx is weaker than 2080ti despite their similar cc.