Hey Akmal.ali
Yeah i actually referenced your thread for help on memory mapping the pins (so thank you). So i’ve also made sure my pointers are volatile and everything.
My actual loop and code (sans the memory mapping part of course) was actually originally written for a very old ti davinci processor and it was performing much faster. That and given your experience with the speed of the TX1 running the same code indicates to me that the code should be fine and its something specific with the TX2.
Hopefully Nvidia can give some additional insight.