SDK samples on 8600GT eval of sample codes for 8600GT card

Summary

This post presents some results from my installation of the CUDA 1.0 Toolkit and SDK and the testing of the sample programs on a GeForce 8600GT card. Eventually, all the sample apps executed on the hardware but there were some apps failed because they required too much device memory, the graphics apps failed with dual monitors unless in Horizontal Span Mode, and the Direct3D demos failed with missing DLLs on this system.

Details and Fixes

Memory Problems: The 8600GT has 256MB RAM. Three of the sample programs ([font=“Courier”]alignedTypes, BlackScholes,[/font] and [font=“Courier”]MonteCarlo[/font]) requested more than that amount of working memory. I had to recompile them with Debug before I got a specific error about out of memory problems. The original windows popup had an error like:

Ideally, programs would determine the array sizes at runtime using the output from [font=“Courier”]cuDeviceGetProperties()[/font] rather than hardwired constants.

Here are the changes required to fit in 256MB:

alignedTypes.cu, line 190, change to: 

    const int MEM_SIZE = 100000000  // from 200000000
BlackScholes.cu, line 187, change to:

    const int OPT_N    =  10000000  // from  20000000
MonteCarlo.cu, line 183, change to:

    const int PATH_N   =  40000000  // from  80000000

Dual Monitor Config: All the non-Direct3D sample programs failed when run in Dual Monitor so called Dual Mode. When the multimonitor config was set to use Horizontal Span Mode via the Nvidia Control Panel then all these apps worked.

The error received was

D3D Required: My system does not have DirectX 9 installed an so the D3D sample programs failed. I have not been able to install DX9 but I assume that will fix the problem with [font=“Courier”]FluidsD3D[/font] and [font=“Courier”]SimpleD3D[/font].

The specific windows requester popup error was of the form

8600GT Sample Code Performance and Selected Output

alignedTypes.exe:

 Testing misaligned types...

  RGBA8_misaligned...  Time: 409.733948 ms / Copy throughput: 0.227299 GB/s.

  LA32_misaligned...   Time: 102.097733 ms / Copy throughput: 0.912187 GB/s.

  RGB32_misaligned...  Time:  94.699989 ms / Copy throughput: 0.983445 GB/s.

  RGBA32_misaligned... Time:  93.200600 ms / Copy throughput: 0.999267 GB/s.

 Testing aligned types...

  RGBA8...             Time:  12.915485 ms / Copy throughput: 7.210899 GB/s.

  I32...               Time:  12.491215 ms / Copy throughput: 7.455821 GB/s.

  LA32...              Time:  11.508869 ms / Copy throughput: 8.092216 GB/s.

  RGB32...             Time: 138.388718 ms / Copy throughput: 0.672976 GB/s.

  RGBA32...            Time:  11.833643 ms / Copy throughput: 7.870126 GB/s.

 RGBA32_2...          Time:  23.729097 ms / Copy throughput: 3.924812 GB/s.

bandwidthTest.exe:

                             Transfer(Bytes)    Bandwidth(MB/s)

  Host (page) to Device BW      33554432              1331.0

  Device to Host (page) BW      33554432              1559.3

  Device to Device      BW      33554432             17002.2

  Host (pin) to Device  BW      33554432              2554.9

  Device to Host (pin)  BW      33554432              1654.6

  Device to Device      BW      33554432             17006.4

convolutionFFT2D.exe:

 Running GPU FFT convolution...

  GPU time: 26.229916 msecs. //38.124408 MPix/s

convolutionSeparable.exe:

 GPU convolution...

  GPU convolution time : 56.189529 msec //298.582604 Mpixels/sec

convolutionTexture.exe:

convolutionRowGPU()    time: 40.435177 msecs; //414.916350 Mpix/s

 cudaMemcpyToArray()    time: 27.114140 msecs; //618.762619 Mpix/s

 convolutionColumnGPU() time: 45.553261 msecs; //368.298903 Mpix/s

deviceQuery.exe:

 There is 1 device supporting CUDA

  

  Device 0: "GeForce 8600 GT"

    Major revision number:                         1

    Minor revision number:                         1

    Total amount of global memory:                 268107776 bytes

    Total amount of constant memory:               65536 bytes

    Total amount of shared memory per block:       16384 bytes

    Total number of registers available per block: 8192

    Warp size:                                     32

    Maximum number of threads per block:           512

    Maximum sizes of each dimension of a block:    512 x 512 x 64

    Maximum sizes of each dimension of a grid:     65535 x 65535 x 1

    Maximum memory pitch:                          262144 bytes

    Texture alignment:                             256 bytes

    Clock rate:                                    1404000 kilohertz

histogram64.exe:

 histogramGPU() time :  46.776257 msec //2038.799995 MB/sec

  histogramCPU() time : 140.687424 msec // 677.867496 MB/sec

scanLargeArray.exe:

 Average GPU execution time: 3.561453 ms

  CPU execution time:         8.766033 ms

System Specs

Software

* WinXP SP2

* CUDA 1.0 Toolkit and SDK

* MS Visual C++ 2005 Express

Hardware

* Intel Core2 6000 @ 2.4GHz

* 2GB RAM

* GeForce 8600GT

Regards,

Chris

Thanks for publishing the tip on Use Horizontal Scan Mode. That’s great.

You’ll run into one more snag if/when you try to get DX9 going (should be easy to just install ms directx sdk - I’m using the June 07 release). The fluidsd3d project happens to have hardcoded which specific G80 cards it supports - for some reason had a very limited set in this release. This didn’t include 8600GT or 8600GTS (or many others). No reason why it shouldn’t.

So for example, for the 8600GTS, you need to add the line into fluidsD3D.cu

ident.DeviceId == 0x400 ||

Or simply comment out the test on device id: /* if (ident.DeviceID == …)

and leave:

*/

adapter = i;

Otherwise you get the fun message “Failed to find a G80”

note: all this gets fixed in the next release

BTW, we’re also working on fixing the memory size errors so they will be fixed in the next SDK update. Many of these samples were written before we had the low-end GPUs out, so it was a simple oversight.

Thanks,
Mark