set of lines of code exec by only one thread


  How to make specific lines of cuda kernel to be executed only by a single thread?

  For example, memory has to be allocated for 50 elements(nodes of a linked list) of the cuda kernel on which 50 threads are working i.e. each thread will work on one element.

  But the memory should be allocated only once.

  Is there any way to handle such situation?



if ((threadIdx.x==0) && (threadIdx.y == 0) && (threadIdx.z==0))



Will it work only for one block? Should we have more then one, the operations will be done by every first thread in the block, am I right?

Fo multiple blocks, you can use the threadfence() function. Whatever it is after the threadfence() will wait for the data to be visible to all blocks.