Best algorithm for 2D FFT on reals

Can someone give me their opinion on what the current state of the art algorithm for 2D FFT on reals is? For 1D transforms, I know I can interleave the reals and do a complex to complex transform, using the speedy implementations by the Microsoft or Berkeley guys. However for 2D, it seems a simple row, column algorithm is not good due to the need for transposes.

It seems to get the best performance, you need to use an inherently 2D FFT algorithm, such as the ones described here. But I don’t know how good they are because in 1988, there was no memory wall so they only analyzed #multiplies and adds.