Thread question

Hi,
If I have a few independant tasks to do, i.e.: compute a few formulas which can be done in different threads, will this be done in parallel?
What I mean is if I write something like this:

if ( 0 == threadIdx.x )
{ ComputeA; ComputeB; ComputeC;}
if ( 1 == threadIdx.x )
{ ComputeD; ComputeE; ComputeF; }
if ( 2 == threadIdx.x )
{ ComputeG; ComputeH; ComputeI; }
__syncthreads();

will it be fater then doing just this:
if ( 0 == threadIdx.x )
{ ComputeA; ComputeB; ComputeC; ComputeD; ComputeE;…ComputeI; }
__syncthreads();

How about reading from different places in gmem from different threads? how does the wrap (or half wrap) affect this, if any?

thanks
eyal

All threads of a warp execute the same code. So no, this will ne be any faster than the second option.
What you can do is branch the execution according to warp (of half warp) size.

Thanks for the reply… thats what I thought happens :)

But how do I do what you’ve suggested with the branching? can you please post a sample code?

thanks

eyal

I have never done it myself really but i would guess something along the lines of

if(threadidx.x&(16-1)==0)
something()
else if threadidx.x&(16-1)==1)
somethingelse()

Now if you only need exactly one thread per “type” of computation, id say youre pretty much screwed as you will only be using 1 SP per MP at all time.

But its early have i havent had coffee, so someone else will pick this up if im completly wrong!

Thanks… actually I tried just to use threadIdx.x == 20,40,60,80,100,… and figured it would be out of half-warp boundaries but didnt see any performance gain.

Any ideas are more then welcomed

thanks

eyal

if you want to distribute work over several warps, you can use the following pattern:

thid_in_warp = threadIdx.x & 31; // linear thid within 1 warp

warp_id = threadIdx.x >> 5; // warp index

if(warp_id == 0) {

// address threads with thid_in_warp

  block1

} else if(warp_id == 1) {

  block2

} 

etc..