Development with Cuda

Good morning,

I’m a novice in Cuda. It was asked to me to develop a small code C and a small code in Cuda and to compare their time of execution.

I use “Transpose” project found in SDK and work on it to develop my function which consists in putting in zero any negative value and in keeping the value if this one is postive.

I give you my function C, my function Cuda, and the functionof comparison in accompanying documents (pdf format because I can’t attach a “.cu” file).

I have a problem with the function CUT_CHECK_ERROR (). As soon as size_x > 17 or size_y > 33, my TEST doesn’t pass anymore.

I’m thanking you for the help you could give me.
transpose.pdf (217 KB)
transpose_kernel.pdf (178 KB)
transpose_gold.cpp (940 Bytes)

[codebox]

const unsigned int size_x = 16;

const unsigned int size_y = 32;

transpose_naive<<< 1, size_x*size_y >>>(d_odata, d_idata);

[/codebox]

size_x*size_y = 512

The maximum sizes of the x-, y-, and z-dimension of a thread block are 512, 512, and 64, respectively.

17*32 > 512

16*33 > 512

The kernel probably failed to launch, it’s good practice to try and catch these errors using cudaGetLastError.

N.

Thanks a lot.

I have an other question.

if I write :

// transpose initial code : (thread block 16,16,1)
dim3 grid(size_x / BLOCK_DIM, size_y / BLOCK_DIM, 1);
dim3 threads(BLOCK_DIM, BLOCK_DIM, 1); // BLOCK_DIM = 16
transpose_naive<<< grid, threads >>>(d_odata, d_idata);

My test pass with sixe_x <=16 et size_y <= 32.
And when it’s superior the kernel doesn’t fail (it’s ok because thread block is always 16,16,1) but the results between my C code and my Cuda code are different. I have put some “printf” to see if it is my Cuda or C code which is wrong and actually it seems to be my cuda code. I don’t understand why, because for a smaller size it works.

[codebox]

unsigned int index_in = threadIdx.x;

[/codebox]

Your calculation of index does not include blockIdx.x and/or blockIdx.y, so in fact all of your blocks are operating on the same block of global memory instead of on separate areas. Therefore only the values in the first block of your data will be calculated (multiple times :D)

N.

It works.

Tank you. :rolleyes:

I should choose a girl’s name next time I join a forum… might get better replies :P

Oh, that’s not fair!, I try to help out everybody:

http://forums.nvidia.com/index.php?showtopic=101835 :)

PS. I would’ve been faster if it were Sopie90 :P

N.

haha, nico. I wasn’t directing that at anyone. I was just making some fun :)

lol …

I’m happy to be a girl if it really works like that. :rolleyes: