This post presents some results from my installation of the CUDA 1.0 Toolkit and SDK and the testing of the sample programs on a GeForce 8600GT card. Eventually, all the sample apps executed on the hardware but there were some apps failed because they required too much device memory, the graphics apps failed with dual monitors unless in Horizontal Span Mode, and the Direct3D demos failed with missing DLLs on this system.
Details and Fixes
Memory Problems: The 8600GT has 256MB RAM. Three of the sample programs ([font=“Courier”]alignedTypes, BlackScholes,[/font] and [font=“Courier”]MonteCarlo[/font]) requested more than that amount of working memory. I had to recompile them with Debug before I got a specific error about out of memory problems. The original windows popup had an error like:
Ideally, programs would determine the array sizes at runtime using the output from [font=“Courier”]cuDeviceGetProperties()[/font] rather than hardwired constants.
Here are the changes required to fit in 256MB:
alignedTypes.cu, line 190, change to: const int MEM_SIZE = 100000000 // from 200000000
BlackScholes.cu, line 187, change to: const int OPT_N = 10000000 // from 20000000
MonteCarlo.cu, line 183, change to: const int PATH_N = 40000000 // from 80000000
Dual Monitor Config: All the non-Direct3D sample programs failed when run in Dual Monitor so called Dual Mode. When the multimonitor config was set to use Horizontal Span Mode via the Nvidia Control Panel then all these apps worked.
The error received was
D3D Required: My system does not have DirectX 9 installed an so the D3D sample programs failed. I have not been able to install DX9 but I assume that will fix the problem with [font=“Courier”]FluidsD3D[/font] and [font=“Courier”]SimpleD3D[/font].
The specific windows requester popup error was of the form
8600GT Sample Code Performance and Selected Output
alignedTypes.exe: Testing misaligned types... RGBA8_misaligned... Time: 409.733948 ms / Copy throughput: 0.227299 GB/s. LA32_misaligned... Time: 102.097733 ms / Copy throughput: 0.912187 GB/s. RGB32_misaligned... Time: 94.699989 ms / Copy throughput: 0.983445 GB/s. RGBA32_misaligned... Time: 93.200600 ms / Copy throughput: 0.999267 GB/s. Testing aligned types... RGBA8... Time: 12.915485 ms / Copy throughput: 7.210899 GB/s. I32... Time: 12.491215 ms / Copy throughput: 7.455821 GB/s. LA32... Time: 11.508869 ms / Copy throughput: 8.092216 GB/s. RGB32... Time: 138.388718 ms / Copy throughput: 0.672976 GB/s. RGBA32... Time: 11.833643 ms / Copy throughput: 7.870126 GB/s. RGBA32_2... Time: 23.729097 ms / Copy throughput: 3.924812 GB/s. bandwidthTest.exe: Transfer(Bytes) Bandwidth(MB/s) Host (page) to Device BW 33554432 1331.0 Device to Host (page) BW 33554432 1559.3 Device to Device BW 33554432 17002.2 Host (pin) to Device BW 33554432 2554.9 Device to Host (pin) BW 33554432 1654.6 Device to Device BW 33554432 17006.4 convolutionFFT2D.exe: Running GPU FFT convolution... GPU time: 26.229916 msecs. //38.124408 MPix/s convolutionSeparable.exe: GPU convolution... GPU convolution time : 56.189529 msec //298.582604 Mpixels/sec convolutionTexture.exe: convolutionRowGPU() time: 40.435177 msecs; //414.916350 Mpix/s cudaMemcpyToArray() time: 27.114140 msecs; //618.762619 Mpix/s convolutionColumnGPU() time: 45.553261 msecs; //368.298903 Mpix/s deviceQuery.exe: There is 1 device supporting CUDA Device 0: "GeForce 8600 GT" Major revision number: 1 Minor revision number: 1 Total amount of global memory: 268107776 bytes Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 16384 bytes Total number of registers available per block: 8192 Warp size: 32 Maximum number of threads per block: 512 Maximum sizes of each dimension of a block: 512 x 512 x 64 Maximum sizes of each dimension of a grid: 65535 x 65535 x 1 Maximum memory pitch: 262144 bytes Texture alignment: 256 bytes Clock rate: 1404000 kilohertz histogram64.exe: histogramGPU() time : 46.776257 msec //2038.799995 MB/sec histogramCPU() time : 140.687424 msec // 677.867496 MB/sec scanLargeArray.exe: Average GPU execution time: 3.561453 ms CPU execution time: 8.766033 ms
* WinXP SP2 * CUDA 1.0 Toolkit and SDK * MS Visual C++ 2005 Express
* Intel Core2 6000 @ 2.4GHz * 2GB RAM * GeForce 8600GT