My CUFFT related code has stopped working since installing CUDA 2.2. Any tips would be appreciated.
I installed the two following packages:
cudasdk_2.2_macos.pkg
cudatoolkit_2.2_macos_32.pkg
Most of the toolkit examples run OK. I can’t really figure out if the issues are CUFFT related. Most of the CUFFT examples fail, but others don’t (please note the MPix/s is 0.00 for the ones that fail):
$ ./simpleCUFFT
Using device 0: GeForce 9600M GT
GPU time: 229.100006 msecs. //0.000000 MPix/s
Test FAILED
Press ENTER to exit...
$ ./simpleCUFFT2
Using device 0: GeForce 9600M GT
GPU time: 229.839996 msecs. //0.000000 MPix/s
Test FAILED
$ ./convolutionFFT2D
Using device 0: GeForce 9600M GT
Input data size : 1000 x 1000
Convolution kernel size : 7 x 7
Padded image size : 1006 x 1006
Aligned padded image size : 1024 x 1024
Allocating memory...
Generating random input data...
Creating FFT plan for 1024 x 1024...
Uploading to GPU and padding convolution kernel and input data...
...initializing padded kernel and data storage with zeroes...
...copying input data and convolution kernel from host to CUDA arrays
...binding CUDA arrays to texture references
...padding convolution kernel
...padding input data array
Transforming convolution kernel...
Running GPU FFT convolution...
GPU time: 42.544998 msecs. //23.504526 MPix/s
Reading back GPU FFT results...
Checking GPU results...
...running reference CPU convolution
...comparing the results
Max delta / CPU value 1.588891E-06
L2 norm: 1.902640E-07
TEST PASSED
Shutting down...
Press ENTER to exit...
My device is the first one of the below (Mac OS X 10.5.7, MBP):
$ ./deviceQuery
There are 2 devices supporting CUDA
Device 0: "GeForce 9600M GT"
Major revision number: 1
Minor revision number: 1
Total amount of global memory: 536543232 bytes
Number of multiprocessors: 4
Number of cores: 32
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 8192
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 262144 bytes
Texture alignment: 256 bytes
Clock rate: 0.78 GHz
Concurrent copy and execution: No
Device 1: "GeForce 9400M"
Major revision number: 1
Minor revision number: 1
Total amount of global memory: 266010624 bytes
Number of multiprocessors: 2
Number of cores: 16
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 8192
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 262144 bytes
Texture alignment: 256 bytes
Clock rate: 0.25 GHz
Concurrent copy and execution: No
Test PASSED
Press ENTER to exit...