CUTTF bug problem with inversion of the 3d fourier transform

Good day,

i’m implementing in ANSI C some algorithms to reconstruct volumes from from 2d TAC projections in parallel beam and cone beam geometry.
In the implementation of the projection slice theorem (or fourier slice theorem) to invert the 3d fourier transform i use the CUFFT library to run the algorithm on GPU and FFTW to run the algorithm on CPU
and the i visualize the results with MATLAB.

working with volumes 300300300 (27000000 Voxels) and 350350350 (42875000 Voxels) there aren’t problems.

with bigger volumes 380380380 (54872000 Voxels) you can see from image posted below (Confronto_Ric_Vol_380x380x380_900pr.jpeg) that the reconstruction with cufft (3rd line)
presents some artifacts whereas the reconstruction with FFTW is ok (2nd line).

when i try with volumes 400400400 (64000000 Voxels) the reconstruction with CUFFT returns a volume totally set to zero except in the middle as you can see in the second image posted (Ric_vol_FourierSliceCUFFT_400.jpeg).

i suspect that this issue is caused by the fact that my GPU works only with single-precision floating point whereas FFTW on CPU works with double-precision floating point but i would like to have a more precise answer by NVIDIA, if is possible.

My system is:
Ubuntu 10.04 64bit, CUDA 3.1.1, dev driver 260.24, AthlonX2 7850, 4GB RAM, GeForce250 GTS 1GB VRAM.

Thank’s a lot,
Andrea Dossi

Good day,

i’m implementing in ANSI C some algorithms to reconstruct volumes from from 2d TAC projections in parallel beam and cone beam geometry.
In the implementation of the projection slice theorem (or fourier slice theorem) to invert the 3d fourier transform i use the CUFFT library to run the algorithm on GPU and FFTW to run the algorithm on CPU
and the i visualize the results with MATLAB.

working with volumes 300300300 (27000000 Voxels) and 350350350 (42875000 Voxels) there aren’t problems.

with bigger volumes 380380380 (54872000 Voxels) you can see from image posted below (Confronto_Ric_Vol_380x380x380_900pr.jpeg) that the reconstruction with cufft (3rd line)
presents some artifacts whereas the reconstruction with FFTW is ok (2nd line).

when i try with volumes 400400400 (64000000 Voxels) the reconstruction with CUFFT returns a volume totally set to zero except in the middle as you can see in the second image posted (Ric_vol_FourierSliceCUFFT_400.jpeg).

i suspect that this issue is caused by the fact that my GPU works only with single-precision floating point whereas FFTW on CPU works with double-precision floating point but i would like to have a more precise answer by NVIDIA, if is possible.

My system is:
Ubuntu 10.04 64bit, CUDA 3.1.1, dev driver 260.24, AthlonX2 7850, 4GB RAM, GeForce250 GTS 1GB VRAM.

Thank’s a lot,
Andrea Dossi

It sounds like you might be running out of GPU memory. Are you checking the return codes from cufftExec() to be sure it returns success?

Thanks,

Cliff

It sounds like you might be running out of GPU memory. Are you checking the return codes from cufftExec() to be sure it returns success?

Thanks,

Cliff

when i check the return value of cufftExec trying to reconstruct a volume 400400400 voxel it is CUFFT_EXEC_FAILED.

ok, but this answer only to my 2nd question because cufftExec with a volume 380380380 returns CUFFT_SUCCESS but

there are al lot of artifacts in the reconstruction using CUFFT that are not prensent in the reconstruction using FFTW , as you can see in the 1st image posted and in this new image (much better).

in these reconstructions no filters are applied, the only difference in the algorithms is the library used to calculate the inverse transform.

thanks a lot for your patience.

when i check the return value of cufftExec trying to reconstruct a volume 400400400 voxel it is CUFFT_EXEC_FAILED.

ok, but this answer only to my 2nd question because cufftExec with a volume 380380380 returns CUFFT_SUCCESS but

there are al lot of artifacts in the reconstruction using CUFFT that are not prensent in the reconstruction using FFTW , as you can see in the 1st image posted and in this new image (much better).

in these reconstructions no filters are applied, the only difference in the algorithms is the library used to calculate the inverse transform.

thanks a lot for your patience.

up?

up?

Nobody from NVIDIA could tell me what cause (or could cause) these artifacts with volumes bigger than 350350350?

cufftExec returns “CUFFT_SUCCESS” so i expect that the result is correct. but it isn’t.

probably this issue is related to the fact that my vga supports only single-precision floating point but i would like a confirmation from NVIDIA
because it’s very important for my thesis and i think it’s should be reported in the library documentation.

thanks,
Andrea Dossi

Nobody from NVIDIA could tell me what cause (or could cause) these artifacts with volumes bigger than 350350350?

cufftExec returns “CUFFT_SUCCESS” so i expect that the result is correct. but it isn’t.

probably this issue is related to the fact that my vga supports only single-precision floating point but i would like a confirmation from NVIDIA
because it’s very important for my thesis and i think it’s should be reported in the library documentation.

thanks,
Andrea Dossi

Just a guess…but the prime factors of

300 are 2, 2, 3, 5, 5

350 are 2, 5, 5, 7

380 are 2, 2, 5, 19

Depending on the implementation that fairly large prime factor (19) could be causing accuracy issues with that transform. Do you see similar problems with a 2D or 1D transform of that dimension? 378 and especially 384 have a much nicer set of prime factors.

Just a guess…but the prime factors of

300 are 2, 2, 3, 5, 5

350 are 2, 5, 5, 7

380 are 2, 2, 5, 19

Depending on the implementation that fairly large prime factor (19) could be causing accuracy issues with that transform. Do you see similar problems with a 2D or 1D transform of that dimension? 378 and especially 384 have a much nicer set of prime factors.

This is likely a very good guess at what’s going on.

For this reason, actually, I was just about to ask if you’ve (ramarromarrone) tried with CUFFT 3.2RC? The accuracy of these sizes that are not radix 2, 3, 5, or 7 should be greatly improved in 3.2.

Thanks,

Cliff

This is likely a very good guess at what’s going on.

For this reason, actually, I was just about to ask if you’ve (ramarromarrone) tried with CUFFT 3.2RC? The accuracy of these sizes that are not radix 2, 3, 5, or 7 should be greatly improved in 3.2.

Thanks,

Cliff

i followed your advices and i tried with volumes 378378378 and 384384384.
now it works very well (with cuda 3.1!!! **), in image CUDA_3.1.jpeg you can see that the artifact compares only in volume 380380380.
image Ric_vol_FS_CUFFT_384_900pr.jpeg shows that the entire volume is well reconstructed.

my problem is resolved, so thank you very very much for your support!

** PS:
i also tried cuda toolkit 3.2 RC and it works very badly as you can see in image CUDA_RC3.2.jpeg
i know it’s a release candidate but results are terrible! i had to downgrade to cuda 3.1.

i followed your advices and i tried with volumes 378378378 and 384384384.
now it works very well (with cuda 3.1!!! **), in image CUDA_3.1.jpeg you can see that the artifact compares only in volume 380380380.
image Ric_vol_FS_CUFFT_384_900pr.jpeg shows that the entire volume is well reconstructed.

my problem is resolved, so thank you very very much for your support!

** PS:
i also tried cuda toolkit 3.2 RC and it works very badly as you can see in image CUDA_RC3.2.jpeg
i know it’s a release candidate but results are terrible! i had to downgrade to cuda 3.1.

I’d definitely like to be sure that the issue you saw in 3.2 RC1 is fixed up. Are you getting errors returned back from cufftExec with 3.2 RC1?

I’d definitely like to be sure that the issue you saw in 3.2 RC1 is fixed up. Are you getting errors returned back from cufftExec with 3.2 RC1?

just yesterday i re-tried cuda toolkit 3.2RC,

cufftExec returned CUFFT_SUCCESS but issues were the same both on pc and notebook.

pc configuration :

ubuntu 10.10 64bit, gcc 4.5, devdriver 260.24, eclipse with cuda plugin, AthlonX2 7850, 4gb ram, geforce gts250 1gb vram

notebook configuration:

ubuntu 10.10 64bit, gcc 4.5, devdriver 260.24, eclipse with cuda plugin, core i3 330M, 4 gb ram, geforce gt320M 1 gb vram.

For me it’s not a problem because i can work, and i actually work very well both on pc and notebook, with cuda 3.1.1 but if you tell me you need more informations about these problems with

3.2RC i can send you more images, code,etc etc…

just yesterday i re-tried cuda toolkit 3.2RC,

cufftExec returned CUFFT_SUCCESS but issues were the same both on pc and notebook.

pc configuration :

ubuntu 10.10 64bit, gcc 4.5, devdriver 260.24, eclipse with cuda plugin, AthlonX2 7850, 4gb ram, geforce gts250 1gb vram

notebook configuration:

ubuntu 10.10 64bit, gcc 4.5, devdriver 260.24, eclipse with cuda plugin, core i3 330M, 4 gb ram, geforce gt320M 1 gb vram.

For me it’s not a problem because i can work, and i actually work very well both on pc and notebook, with cuda 3.1.1 but if you tell me you need more informations about these problems with

3.2RC i can send you more images, code,etc etc…