I can't understand kernel

Hi
First, I apologize for my awful english.
I’m CUDA beginner and it’s first time to programming by parallel method.
So I can’t exactly understand how kernel works.

I’m viewing the sample codes ‘cpp_integration’ in SDK project folder.
To know the way of kernel working, I modified the codes a little bit.

for example I modified the codes like below.


from:
const unsigned int num_threads = len / 4;
to:
const unsigned int num_threads = len / 2;

from:
g_data[tid] = ((((data << 0) >> 24) - 10) << 24)
| ((((data << 8) >> 24) - 10) << 16)
| ((((data << 16) >> 24) - 10) << 8)
| ((((data << 24) >> 24) - 10) << 0);
to:
g_data[tid] = ((((data << 0) >> 24) - 10) << 24)
| ((((data << 24) >> 24) - 10) << 0);


Because I thought that g_data should have the ‘or’ case as many as
‘len/num_threads’, so I modified the code in that way.

When I build the modified program, there weren’t any error or warning
but it resulted wrong answer.
(original program result ‘Hello world.’ but the sentence that modified program resulted was broken(?).)
What is the problem?

I also read the ‘CUDA programming guide’ but still I don’t know how the kernel works.
Please tell me how can I learn about the CUDA kernel?

Thanks in advance.