I always get burned by the fact that cuFFT indexes with the z index the fastest, whereas using x as the fastest index is best practice for CUDA. By burned I mean I waste hours debugging what ends up being this issue. Although I’m unlikely to forget again after this frustration, it makes code messier and often much slower. (You can’t coalesce reads/writes from arrays with both indexing patterns.) Is there any work around or way to modify cuFFT’s reading of arrays?