Does CUDA provide fftshift() function like matlab?

Dear All,
I am curious that whether CUDA havs available fftshift function (like the one in matlab ) within its libraries?

Best,
ly

Hi Sushiman,

ArrayFire is a CUDA based library developed by us (Accelereyes) that expands on the functions provided by the default CUDA toolkit.

The library contains many functions that are useful in scientific computing, including shift.

shift performs a circular shift by the specified shift amounts.

implementing fftshift and ifftshift is pretty trivial once you have shift. You can use the following macros to implement them.

#define fftshift(in)  shift(in, in.dims(0)/2, in.dims(1)/2)

#define ifftshift(in) shift(in, (in.dims(0)+1)/2, (in.dims(1)+1)/2)

P.S. In case you are wondering ArrayFire and its python, Fortran wrappers are free to use for most cases. You can visit the http://www.accelereyes.com/products/arrayfire to find out more.

Great Help. Thanks.

Hi Sushiman,
let me just mention that you might be interested in the following kernel codes

__global__ void 1dfftshift(double2 *u_d, int N)
{
   int i = blockDim.x * blockIdx.x + threadIdx.x;
   if(i < N)
   {
     double a = pow(-1.0,i&1);
     u_d[i].x *= a;
     u_d[i].y *= a;
    }
}

and

#define IDX2R(i,j,N) (((i)*(N))+(j))
__global__ void fftshift_2D(double2 *data, int N1, int N2)
{
int i = threadIdx.y + blockDim.y * blockIdx.y;
int j = threadIdx.x + blockDim.x * blockIdx.x;

if (i < N1 && j < N2) 
    {
       double a = pow(-1.0, (i+j)&1);
       data[IDX2R(i,j,N2)].x *= a;
       data[IDX2R(i,j,N2)].y *= a;
   }
}

as an alternative to standard fftshift. Differently from Matlab’s fftshift, these kernels do not swap parts of vectors or matrices. Instead they consist in multiplying the vector or matrix to be transformed by a sequence of 1s and -1s which is equivalent to the multiplication by exp(-jnpi) (in the 1d case) and thus to a shift in the conjugate domain.

You have to call these kernels before and after the application of the CUFFT.

One pro is that memory movements are avoided.

Following the suggestion received at https://devtalk.nvidia.com/default/topic/527302/cuda-programming-and-performance/coalesced-memory-access-and-global-memory-load-store-efficiency-with-complex-arithmetics-in-cuda/?offset=3#3729796, you can change the line

double a = pow(-1.0,(i+j)&1);

to

double a = 1-2*((i+j)&1);

to avoid the use of the slow routine pow.

I have recently published a paper about a generic fft-shift implementation in both spatial and frequency domain in case you can’t really exploit the property of doing the shift in the conjucate domain as JFSebastian described. My implementation covers 1D, 2D, 3D fft-shifts for flat-arrays.

http://marwan-abdellah.com/CUDA_FFTSHIFT.pdf

You will find the code on github (https://github.com/marwan-abdellah/cufftShift)

All the best.

I have read your paper.

Can you provide examples of when you cannot apply the scheme above avoiding memory movements?

JFSebastian,

Thanks for your solution, this works great!

I’ve noticed one small problem with it however. It seems to only work when the number of samples you are inputting into the FFT is an EVEN number. When i input an ODD number of samples this shift doesn’t work properly.

Working example with an even number of elements (matlab output):

>> fftshift(fft([1 2 3 4]))

ans =

  -2.0000 + 0.0000i  -2.0000 - 2.0000i  10.0000 + 0.0000i  -2.0000 + 2.0000i

>> fft([1 -2 3 -4])

ans =

  -2.0000 + 0.0000i  -2.0000 - 2.0000i  10.0000 + 0.0000i  -2.0000 + 2.0000i

Good, everything matches!

Now if we add an extra element to the vector (make it 1 2 3 4 5) so that it has an ODD number of elements. The shifting no longer works as expected.

>> fftshift(fft([1 2 3 4 5]))

ans =

  -2.5000 - 0.8123i  -2.5000 - 3.4410i  15.0000 + 0.0000i  -2.5000 + 3.4410i  -2.5000 + 0.8123i

>> fft([1 -2 3 -4 5])

ans =

   3.0000 + 0.0000i   2.7361 + 2.5429i  -1.7361 +10.7719i  -1.7361 -10.7719i   2.7361 - 2.5429i

Any idea on how to make this work for an odd number of elements?

You can get an odd shift by multiplying by pow(pow(-1,(N+1)/N), i), where N is the length. That’s obviously a much more complicated computation and can’t be simplified with the bitwise operations commonly used, but there it is. Here’s python code.

>>> import math
>>> import numpy as np
>>> A = np.arange(1,8)
>>> A
array([1, 2, 3, 4, 5, 6, 7])
>>> inx = np.arange(7)
>>> fact = pow(-1, (A.size + 1)/A.size)
>>> afact = np.power(fact,inx)
>>> afact
array([ 1.        +0.j        , -0.90096887-0.43388374j,
        0.6234898 +0.78183148j, -0.22252093-0.97492791j,
       -0.22252093+0.97492791j,  0.6234898 -0.78183148j,
       -0.90096887+0.43388374j])
>>> np.fft.ifft ( np.fft.fft(A) * afact)
array([5.-4.44089210e-16j, 6.-5.98212619e-16j, 7.-3.75800343e-16j,
       1.+1.47593148e-15j, 2.+3.98770332e-16j, 3.-5.04049884e-17j,
       4.-4.06194651e-16j])