mmapped GPIO registers

Moving from TX1 to TX2 on a custom board. I am programming the FPGA on the board, by writing directly to a set of gpio pins. I open /dev/mem and mmap the registers into userland and write to them directly. On the TX1, this process allows me to toggle a pin at ~16MHz. However on the TX2, I can only manage ~ 2 MHz. This seems unaffected by the processor speed and so I wonder if there is a change that causes this behaviour?

I am writing to GPIO_CNF_, GPIO_OE, GPIO_OUT, GPIO_IN registers for the relevant pins. I have already hogged the pins in the kernel dtb file and can program the FPGA using these registers.