Hi folks, first time poster here so I’ll try to give sufficient info.
Utilising a split-step FFT setup for evaluation of ground-state of quantum system. I can successfully utilise a gridsize of 128x256x256 (8388608) cufftDoubleComplex on my code, but have been unable to go any higher. This has been running on a 1GB 560 Ti, so we figured it may be a memory issue and upgraded to a 580GTX with 3GB. However, the same issue arises. The code involves computing a forward Z2Z FFT, multiplying by complex values, then performing an inverse Z2Z FFT. The forward FFT executes successfully (it seems), however the return type of the inverse FFT is given as 6 (EXEC_FAILED). I am currently stumped by this, and if anybody has any insight it would be greatly appreciated.
I have included some code and sample output:
Note: wfc is “signal” on the host, and fft, buffer1, buffer2 are three allocated memory blocks on the device, gSize is the size of grid (ie 128256256)
.
.
.
for(step=0; step<MAX_STEPS; step++) {
local_sum = 0.0; sum = 0.0;
cudaMemGetInfo( &avail, &total );
used = total - avail;
printf("9 MEMORY U/A/T: %u / %u / %u\n",used,avail,total);
cudaMemcpy(fft, wfc, sizeof(cufftDoubleComplex)*alloc_local, cudaMemcpyHostToDevice);
result = cufftExecZ2Z(plan_f,fft,fft,CUFFT_FORWARD);
cudaMemGetInfo( &avail, &total );
used = total - avail;
printf("10 MEMORY U/A/T: %u / %u / %u\n",used,avail,total);
isError(result,"Z2Z 1");
scalarDiv<<<alloc_local/256,256>>>(fft,pow(gSize,0.5),fft); //Normalise
cMult<<<alloc_local/256,256>>>(buffer1,fft,fft); //EKp complex Mult
//Complex to Complex Transform (Inverse) 2
result = cufftExecZ2Z(plan_b,fft,fft,CUFFT_INVERSE);
isError(result,"Z2Z 2");
scalarDiv<<<alloc_local/256,256>>>(fft,pow(gSize,0.5),fft); //Normalise
cMult<<<alloc_local/256,256>>>(buffer2,fft,fft); //EVr complex Mult
//Complex to Complex Transform (Forward) 3
result = cufftExecZ2Z(plan_f,fft,fft,CUFFT_FORWARD);
isError(result,"Z2Z 3");
scalarDiv<<<alloc_local/256,256>>>(fft,pow(gSize,0.5),fft);
cMult<<<alloc_local/256,256>>>(buffer1,fft,fft);
//Complex to Complex Transform (Inverse) 4
result = cufftExecZ2Z(plan_b,fft,fft,CUFFT_INVERSE);
isError(result,"Z2Z 4");
scalarDiv<<<alloc_local/256,256>>>(fft,pow(gSize,0.5),fft);
cudaMemcpy(wfc, fft, sizeof(cufftDoubleComplex)*alloc_local, cudaMemcpyDeviceToHost);
Aside:
For the same grid size of 256x256x256, the GTX 580 gives:
9 MEMORY U/A/T: 1695776768 / 1524989952 / 3220766720
10 MEMORY U/A/T: 1695776768 / 1524989952 / 3220766720
Error has occurred for method Z2Z 2 with return type 6
whereas the 560 gives:
9 MEMORY U/A/T: 1004765184 / 68517888 / 1073283072
10 MEMORY U/A/T: 1004765184 / 68517888 / 1073283072
Error has occurred for method Z2Z 2 with return type 6
Thanks in advance,
Lee.