I read that the Geforce 8600 GT has 32 stream processor, but when I run “deviceQuery” from the CUDA SDK 2.0 I get this:
Device 0: “GeForce 8600 GT”
Major revision number: 1
Minor revision number: 1
Total amount of global memory: 536543232 bytes
Number of multiprocessors: 2 <--------------------------------------------!!
Number of cores: 16 <------------------------------------------!!
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 8192
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 262144 bytes
Texture alignment: 256 bytes
Clock rate: 1.19 GHz
Concurrent copy and execution: Yes
It means 8 processsors per multiprocessor, having just two multiprocessors. The code that generates the line which prints the number of cores is this:
printf(" Number of cores: %i"\n, 8 * deviceProp.multiProcessorCount);
Maybe the driver is not detecting the number of MPs correctly?
Anyone can post me his deviceQuery from a Geforce 8600?
Device 0: “GeForce 8600 GT”
Major revision number: 1
Minor revision number: 1
Total amount of global memory: 268435456 bytes
Number of multiprocessors: 4
Number of cores: 32
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 8192
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 262144 bytes
Texture alignment: 256 bytes
Clock rate: 1.31 GHz
Concurrent copy and execution: No
I don’t know why for you it says 2. What brand is your card? Where did you buy it?
I realized that my Geforce is on a PCI Express x8 port (I don’t have any x16 free port).
Could be that the port is x8 that causes the use of only 2 multiprocessors? (But I think that this should only affect to the bandwidth in data transfers, not the processing power of the GPU).
I’ve never heard of Point of View. Could be they messed something up setting config bits? Or lied? What do you get for matrixMul and bandwidthTest samples? I get 0.30ms and 15.7 GB/s device-to-device.
3.7 GB/s DDR bandwidth doesn’t make sense. Maybe the incorrect parameters the card reports mess up the cudaMemcpy() function (ie, the wrong blocksize, etc. get selected). But from the matrixMul test it looks like you do have 32 shaders.
I’d contact the company. I’m sure they’d like to be aware of the issue as well. (This has nothing to do with PCIe x8.)