Development with Cuda

sophie92 · July 17, 2009, 8:55am

Good morning,

I’m a novice in Cuda. It was asked to me to develop a small code C and a small code in Cuda and to compare their time of execution.

I use “Transpose” project found in SDK and work on it to develop my function which consists in putting in zero any negative value and in keeping the value if this one is postive.

I give you my function C, my function Cuda, and the functionof comparison in accompanying documents (pdf format because I can’t attach a “.cu” file).

I have a problem with the function CUT_CHECK_ERROR (). As soon as size_x > 17 or size_y > 33, my TEST doesn’t pass anymore.

I’m thanking you for the help you could give me.
transpose.pdf (217 KB)
transpose_kernel.pdf (178 KB)
transpose_gold.cpp (940 Bytes)

Nico · July 17, 2009, 9:22am

[codebox]

const unsigned int size_x = 16;

const unsigned int size_y = 32;

transpose_naive<<< 1, size_x*size_y >>>(d_odata, d_idata);

[/codebox]

size_x*size_y = 512

The maximum sizes of the x-, y-, and z-dimension of a thread block are 512, 512, and 64, respectively.

17*32 > 512

16*33 > 512

The kernel probably failed to launch, it’s good practice to try and catch these errors using cudaGetLastError.

N.

sophie92 · July 17, 2009, 12:17pm

Thanks a lot.

I have an other question.

if I write :

// transpose initial code : (thread block 16,16,1)
dim3 grid(size_x / BLOCK_DIM, size_y / BLOCK_DIM, 1);
dim3 threads(BLOCK_DIM, BLOCK_DIM, 1); // BLOCK_DIM = 16
transpose_naive<<< grid, threads >>>(d_odata, d_idata);

My test pass with sixe_x <=16 et size_y <= 32.
And when it’s superior the kernel doesn’t fail (it’s ok because thread block is always 16,16,1) but the results between my C code and my Cuda code are different. I have put some “printf” to see if it is my Cuda or C code which is wrong and actually it seems to be my cuda code. I don’t understand why, because for a smaller size it works.

Nico · July 17, 2009, 12:55pm

[codebox]

unsigned int index_in = threadIdx.x;

[/codebox]

Your calculation of index does not include blockIdx.x and/or blockIdx.y, so in fact all of your blocks are operating on the same block of global memory instead of on separate areas. Therefore only the values in the first block of your data will be calculated (multiple times :D)

N.

sophie92 · July 17, 2009, 2:54pm

It works.

Tank you. :rolleyes:

rewolf · July 17, 2009, 3:55pm

I should choose a girl’s name next time I join a forum… might get better replies :P

Nico · July 17, 2009, 4:12pm

Oh, that’s not fair!, I try to help out everybody:

http://forums.nvidia.com/index.php?showtopic=101835 :)

PS. I would’ve been faster if it were Sopie90 :P

N.

rewolf · July 17, 2009, 4:23pm

haha, nico. I wasn’t directing that at anyone. I was just making some fun :)

sophie92 · July 20, 2009, 8:33am

lol …

I’m happy to be a girl if it really works like that. :rolleyes:

Topic		Replies	Views
blocks bigger than 512 threads I can't see the error CUDA Programming and Performance	2	3952	March 2, 2009
No error for exceeding thread/grid size? CUDA Programming and Performance	0	5231	August 9, 2007
Weird behavior of CUDA CUDA Programming and Performance	6	5596	February 13, 2008
Kernel function doesn't launch with block size >16 Block size of 4, 8, and 16 launch fine CUDA Programming and Performance	2	2913	July 28, 2008
Threads and blocks concept question Invoking a kernel CUDA Programming and Performance	2	1695	December 5, 2007
Probably a simple answer Simple CUDA code - unexpected result CUDA Programming and Performance	7	4925	October 27, 2010
Why Can't it run? CUDA Programming and Performance	1	2745	December 25, 2008
Grid dimensions CUDA Programming and Performance	6	5664	September 18, 2009
Thread block size and data size problem CUDA Programming and Performance	4	7730	February 5, 2010
Exceeding number of threads/block in a kernel CUDA Programming and Performance	1	2728	July 24, 2010

Development with Cuda

Related topics