GPIO Bit-Bang Speed Increase

Hi, I also have the same problem on the TX2 for a similar purpose. On the TX1, the same code to toggle GPIO pins is much faster.
See:
https://devtalk.nvidia.com/default/topic/1041993/bitbanging-gpio-lines-on-tx2/?offset=3#5285567

Could anyone from Nvidia comment on the difference observed and/or the best way to set/clear gpio pins?