I believe I have uncovered a bug with CUDA / CUDA FFT. It turns out if you launch a kernel with 0 threads, the CUDA FFT routine will fail.
For a variety of reasons I typically launch a kernel with an integral product of block and grid sizes and then I launch whatever doesn’t fit
as a kernel with a ‘residual’ size. That residual size is zero often enough if the the block and grid size product equals my problem size.
This has worked for me so far with no complaints about the empty kernel calls until I tried launching an FFT. An empty kernel call ruins the FFT.
Here is a simple code example of my fft.cu.
[codebox]
#include <stdio.h>
#include <math_functions.h>
#include “cublas.h”
#include “cufft.h”
#include “cutil.h”
global void vec_set(float *v, unsigned off)
{
unsigned j = threadIdx.x + blockIdx.x * blockDim.x + off ;
unsigned j2 = j << 1 ;
v[j2] = 1.0 ;
v[j2+1] = 0.0 ;
}
int main(int argc, char *argv)
{
char ch ;
cublasInit() ;
#define NX 256
#define BATCH 1
dim3 block ;
dim3 grid ;
dim3 residual ;
block.x = NX ;
block.y = BATCH ;
grid.x = 1 ;
residual.x = 0 ;
residual.y = block.y ;
cufftHandle plan ;
cufftComplex *data ;
cudaMalloc((void**)&data, sizeof(cufftComplex)NXBATCH);
/* Create a 1D FFT plan. */
cufftPlan1d(&plan, NX, CUFFT_C2C, BATCH);
/* run a kernel */
vec_set<<<grid,block>>>((float *)data, 0) ;
/* The next line breaks the fft code. It launches a kernel with 0 threads */
vec_set<<<1,residual>>>((float *)data, block.x * grid.x) ;
/* Use the CUFFT plan to transform the signal in place. */
cufftExecC2C(plan, data, data, CUFFT_FORWARD);
/* Inverse transform the signal in place. */
cufftExecC2C(plan, data, data, CUFFT_INVERSE);
cublasShutdown() ;
printf(“Goodbye Cruel World\n”) ;
ch = getchar() ;
}
[/codebox]
Launching a kernel with zero block size or grid size should cause nothing to get executed and certainly shouldn’t break the FFT code.
It’s a bug!
Here is the error output:
cufft: ERROR: D:/Bld/rel/gpgpu/toolkit/r2.1/cufft/src/execute.cu, line 1070
cufft: ERROR: CUFFT_EXEC_FAILED
cufft: ERROR: D:/Bld/rel/gpgpu/toolkit/r2.1/cufft/src/cufft.cu, line 151
cufft: ERROR: CUFFT_EXEC_FAILED