number of threads and number of data

Some concepts about memory copy are not quite clear for me, my questions could be quite simple and stupid:

  1. if there is float number array containing 100 float numbers in the device memory, and I have 256 threads (only 1 block in total). If i use threadIdx.x as the index of the array, it will crash, am i right?

  2. if there is float number array containing 100 float numbers in the device memory, and I have 32 threads (only 1 block in total). If i use threadIdx.x as the index of the array, so basiclly each thread needs to deal with several numbers, so i use data[threadidx.x * n] where n = 1, 2, 3, 4, is this correct? Then sometimes the index will exceed 100, what will happen then(this question is almost the some as the first one)?

Thank you for help!

Just like regular CPU coding, you have to make sure you dont access out of bounds items.

You’ll have to add if statements to your code to make sure that the threads dont access items beyond the array.

One common way is to pad your array.

So in your first example you’d create a 256 array size and in your second example you’d create 32 * 4 array.

The host code will just ignore the results past item 100 (the padded values).

eyal

Ok, thank you so much! Now I understand