I try the matrixmul example of nvidia. To execute it, I choose :
#define BLOCKSIZE 10 #define N 1000
But when I read the cuda_profile.log after the execution, i have this :
method=[ _Z15matrixMulKernelPfS_S_ii ] gputime=[ 543612.352 ] cputime=[ 543727.000 ] occupancy=[ 0.667 ] gld_coherent=[ 60000 ] gld_incoherent=[ 99760000 ] gst_coherent=[ 0 ] gst_incoherent=[ 1000000 ]
Why gst_incoherent is equal to 1000000??
I run this example on a nvidia quadro fx 1700 with the following charateristics :
Major revision number: 1 Minor revision number: 1 Total amount of global memory: 536150016 bytes Number of multiprocessors: 4 Number of cores: 32 Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 16384 bytes Total number of registers available per block: 8192 Warp size: 32 Maximum number of threads per block: 512 Maximum sizes of each dimension of a block: 512 x 512 x 64 Maximum sizes of each dimension of a grid: 65535 x 65535 x 1 Maximum memory pitch: 262144 bytes Texture alignment: 256 bytes Clock rate: 0.92 GHz Concurrent copy and execution: Yes
I want to use my graphic card to its maximum for this matrixmul example. So can you help me to determine the corresponding BLOCKSIZE and grid dimension please?
Thanks for your help!