only-once-instruction in kernel

what I’ve got right now is a multithreaded kernel and another one, that executes only one instruction X in a single thread. I’d like to merge them, however somthing like

if ( condition that is only true in one of the threads )

is leading to strange results. The performance guidelines deprecate control flow instructions, but in this case i would prefer them over using a second kernel. I wonder why it doesn’t work.

Another (related question): I’m not sure what exactly happens, if I do

shared int c;

in a kernel with n threads. How much will c be increased?

I would be grateful for any explanations or reading suggestions, I didn’t find exact specifcation about this in the programing guidelines.


There is nothing wrong, even performance-wise for using if statements and branches. Optimizing for divergent warps is the lowest priority optimization and rarely (in my experience) improves performance.

There shouldn’t be anything wrong with your if statement unless it depends on data written to global memory in another one of the threads.

shared int c;

c will be incremented an undefined number of times. This is invalid code. I thought the programming guide made this clear, but it’s been a while since I read it. The histogram sample in the SDK may be of interest to you if you.

Thank you for your quick answer.
Hmm, that’s what I thought.
Here is what i want to do:
c[i*n+j]= (i==j) ? sqrt(sum) : sum;
where i is a parameter to the kernel and j is the unique block-index.
Now I wonder why the result is different (e.g. correct) when I write instead
and subsequently call a single-thread-kernel that does

If you add volatile, and execute this with a blocksize of at most 32 threads (warp size), this will result in c being increased with exactly 1. As all threads execute in tandem, they will all read c and write c+1.

In case of multiple warps you can never be sure.