Hi all,
I have two questions:
1.
I’m using the matrixMul example as benchmark to see the difference in time between the CPU and GPU. To do this i made some changes in the original matrixMul code.
<b>*</b> The matrices A and B both are initialised at 1 (so every element in the matrix is 1.0f);
<b>*</b> The matrices are always a square (N*N).
now i have the following problem. When i take a blocksize of 32 and a N of 256 (65536 elements per matrix) the printDiff is telling me that the outcome is not the same between the CPU and GPU.
2.
The second question I have is, when i say the blocksize is 32 then the number of threads will be 1024 or are those independent from each other. Because the maximum number of threads per block can only be 512. and the max of the multiproc. is 768 threads. No it happens to be that when i take a blocksize of 32 and a N of 512 the outcome of CPU and GPU are the same.
Some additional information:
N blocksize
256 32 → false
256 16 → true
512 16 → true
512 32 → true
those are some tested parameters
Systeminformation:
Intel quad Xeon X5355 @ 2.66GHz
2GB ram
Nvidia 8800GTS 320MB
Linux Fedora core 6
CUDA SDK Version 1.0 for Linux
below you can find some of the wrong outcome of the printDiff function:
blocksize: 32
Number of elements: 256
Processing time GPU: 2.495000 (ms)
Name: GeForce 8800 GTS
TotalGlobalMem: 334823424
SharedMemPerBlocks: 16384
RegsPerBlock: 8192
Processing time CPU: 36.185001 (ms)
Test FAILED ndiff(0,0) CPU=256.000000, GPU=2368.000000 n
diff(1,0) CPU=256.000000, GPU=2368.000000 n
diff(2,0) CPU=256.000000, GPU=2368.000000 n
diff(3,0) CPU=256.000000, GPU=2368.000000 n
diff(4,0) CPU=256.000000, GPU=2368.000000 n
diff(5,0) CPU=256.000000, GPU=2368.000000 n
diff(6,0) CPU=256.000000, GPU=2368.000000 n
diff(7,0) CPU=256.000000, GPU=2368.000000 n
diff(8,0) CPU=256.000000, GPU=2368.000000 n
diff(9,0) CPU=256.000000, GPU=2368.000000 n
..
..
..
diff(253,255) CPU=256.000000, GPU=1.000000 n
diff(254,255) CPU=256.000000, GPU=1.000000 n
diff(255,255) CPU=256.000000, GPU=1.000000 n
nTotal Errors = 65536 n
Press ENTER to exit...
Can someone please help me.
Thanks,
Jordy