I’m a novice in Cuda. It was asked to me to develop a small code C and a small code in Cuda and to compare their time of execution.
I use “Transpose” project found in SDK and work on it to develop my function which consists in putting in zero any negative value and in keeping the value if this one is postive.
I give you my function C, my function Cuda, and the functionof comparison in accompanying documents (pdf format because I can’t attach a “.cu” file).
I have a problem with the function CUT_CHECK_ERROR (). As soon as size_x > 17 or size_y > 33, my TEST doesn’t pass anymore.
My test pass with sixe_x <=16 et size_y <= 32.
And when it’s superior the kernel doesn’t fail (it’s ok because thread block is always 16,16,1) but the results between my C code and my Cuda code are different. I have put some “printf” to see if it is my Cuda or C code which is wrong and actually it seems to be my cuda code. I don’t understand why, because for a smaller size it works.
Your calculation of index does not include blockIdx.x and/or blockIdx.y, so in fact all of your blocks are operating on the same block of global memory instead of on separate areas. Therefore only the values in the first block of your data will be calculated (multiple times :D)