matrixMul problem printDiff is flipping plz help me

Hi all,

I have two questions:


I’m using the matrixMul example as benchmark to see the difference in time between the CPU and GPU. To do this i made some changes in the original matrixMul code.

    <b>*</b> The matrices A and B both are initialised at 1 (so every element in the matrix is 1.0f);

    <b>*</b> The matrices are always a square (N*N).

now i have the following problem. When i take a blocksize of 32 and a N of 256 (65536 elements per matrix) the printDiff is telling me that the outcome is not the same between the CPU and GPU.


The second question I have is, when i say the blocksize is 32 then the number of threads will be 1024 or are those independent from each other. Because the maximum number of threads per block can only be 512. and the max of the multiproc. is 768 threads. No it happens to be that when i take a blocksize of 32 and a N of 512 the outcome of CPU and GPU are the same.

Some additional information:

N blocksize

256 32 --> false

256 16 --> true

512 16 --> true

512 32 --> true

those are some tested parameters


Intel quad Xeon X5355 @ 2.66GHz

2GB ram

Nvidia 8800GTS 320MB

Linux Fedora core 6

CUDA SDK Version 1.0 for Linux

below you can find some of the wrong outcome of the printDiff function:

blocksize: 32 

Number of elements: 256 

Processing time GPU: 2.495000 (ms) 

Name: GeForce 8800 GTS

TotalGlobalMem: 334823424

SharedMemPerBlocks: 16384

RegsPerBlock: 8192

Processing time CPU: 36.185001 (ms) 

Test FAILED ndiff(0,0) CPU=256.000000, GPU=2368.000000 n

diff(1,0) CPU=256.000000, GPU=2368.000000 n

diff(2,0) CPU=256.000000, GPU=2368.000000 n

diff(3,0) CPU=256.000000, GPU=2368.000000 n

diff(4,0) CPU=256.000000, GPU=2368.000000 n

diff(5,0) CPU=256.000000, GPU=2368.000000 n

diff(6,0) CPU=256.000000, GPU=2368.000000 n

diff(7,0) CPU=256.000000, GPU=2368.000000 n

diff(8,0) CPU=256.000000, GPU=2368.000000 n

diff(9,0) CPU=256.000000, GPU=2368.000000 n




diff(253,255) CPU=256.000000, GPU=1.000000 n

diff(254,255) CPU=256.000000, GPU=1.000000 n

diff(255,255) CPU=256.000000, GPU=1.000000 n

 nTotal Errors = 65536 n

Press ENTER to exit...

Can someone please help me.



If you are still having a problem would you be willing to zip up your code and post it as an attachment?

Still have the issue, And here is the code. I hope you have the same problem as I have.

If we use blocksize 32 with any given matrix size we get error if we then use blocksize 16 we get no error. returning to the 32 blocksize again we algo get no errors.
matrixMul.tar.gz (6.26 KB)