conditional assignments, which is faster?

Hello,

simple question: which one is fastest? are they the same?

bool cond = somecondition;

if(cond) 

{

	globalArray[tidx]=123;

}

else

{

	globalArray[tidx]=-1;

}

or

globalArray[tidx] = -1;

if(cond) 

{

	globalArray[tidx]=123;

}

or

globalArray[tidx] = (int)cond * 123 + (int)!cond*(-1);

Hello,

simple question: which one is fastest? are they the same?

bool cond = somecondition;

if(cond) 

{

	globalArray[tidx]=123;

}

else

{

	globalArray[tidx]=-1;

}

or

globalArray[tidx] = -1;

if(cond) 

{

	globalArray[tidx]=123;

}

or

globalArray[tidx] = (int)cond * 123 + (int)!cond*(-1);
globalArray[tidx] = cond ? 123  : -1;

Although I assume the compiler will be able to optimize some of your other expressions as well. The most important optimization is to have only one global memory transaction.

globalArray[tidx] = cond ? 123  : -1;

Although I assume the compiler will be able to optimize some of your other expressions as well. The most important optimization is to have only one global memory transaction.

the speed will depend on the amount of warp divergence.

the third option you posted is fastest under presence of divergent paths within warps, as it will trigger only a single memory transaction (per half warp).

the speed will depend on the amount of warp divergence.

the third option you posted is fastest under presence of divergent paths within warps, as it will trigger only a single memory transaction (per half warp).

the compiler is not silly, perhaps he will treat all your three options as one

option 2 is perhaps still the worst

just take a look at the PTX code (pass --keep to compile flags), it isnt that hard to read

the compiler is not silly, perhaps he will treat all your three options as one

option 2 is perhaps still the worst

just take a look at the PTX code (pass --keep to compile flags), it isnt that hard to read

As a lot of optimization happens past the PTX stage, it’s sometimes more helpful to dissassemble the actual binaries using decuda, nv50dis/nvc0dis. or cuobjdump.

As a lot of optimization happens past the PTX stage, it’s sometimes more helpful to dissassemble the actual binaries using decuda, nv50dis/nvc0dis. or cuobjdump.

wow, you are right! this might be a place to lookup too ;)

wow, you are right! this might be a place to lookup too ;)

Option 3 is to avoid, cause of multiplication.

Option 3 is to avoid, cause of multiplication.