conditional assignments, which is faster?

Raphael · October 6, 2010, 12:17pm

Hello,

simple question: which one is fastest? are they the same?

bool cond = somecondition;

if(cond) 

{

	globalArray[tidx]=123;

}

else

{

	globalArray[tidx]=-1;

}

or

globalArray[tidx] = -1;

if(cond) 

{

	globalArray[tidx]=123;

}

or

globalArray[tidx] = (int)cond * 123 + (int)!cond*(-1);

Raphael · October 6, 2010, 12:17pm

Hello,

simple question: which one is fastest? are they the same?

bool cond = somecondition;

if(cond) 

{

	globalArray[tidx]=123;

}

else

{

	globalArray[tidx]=-1;

}

or

globalArray[tidx] = -1;

if(cond) 

{

	globalArray[tidx]=123;

}

or

globalArray[tidx] = (int)cond * 123 + (int)!cond*(-1);

tera · October 6, 2010, 1:24pm

globalArray[tidx] = cond ? 123  : -1;

Although I assume the compiler will be able to optimize some of your other expressions as well. The most important optimization is to have only one global memory transaction.

tera · October 6, 2010, 1:24pm

globalArray[tidx] = cond ? 123  : -1;

Although I assume the compiler will be able to optimize some of your other expressions as well. The most important optimization is to have only one global memory transaction.

cbuchner1 · October 6, 2010, 3:42pm

the speed will depend on the amount of warp divergence.

the third option you posted is fastest under presence of divergent paths within warps, as it will trigger only a single memory transaction (per half warp).

cbuchner1 · October 6, 2010, 3:42pm

the speed will depend on the amount of warp divergence.

the third option you posted is fastest under presence of divergent paths within warps, as it will trigger only a single memory transaction (per half warp).

devkec · October 6, 2010, 5:47pm

the compiler is not silly, perhaps he will treat all your three options as one

option 2 is perhaps still the worst

just take a look at the PTX code (pass --keep to compile flags), it isnt that hard to read

devkec · October 6, 2010, 5:47pm

the compiler is not silly, perhaps he will treat all your three options as one

option 2 is perhaps still the worst

just take a look at the PTX code (pass --keep to compile flags), it isnt that hard to read

tera · October 6, 2010, 6:26pm

As a lot of optimization happens past the PTX stage, it’s sometimes more helpful to dissassemble the actual binaries using ~~decuda, nv50dis/nvc0dis. or~~ cuobjdump.

tera · October 6, 2010, 6:26pm

As a lot of optimization happens past the PTX stage, it’s sometimes more helpful to dissassemble the actual binaries using ~~decuda, nv50dis/nvc0dis. or~~ cuobjdump.

devkec · October 11, 2010, 8:19pm

wow, you are right! this might be a place to lookup too ;)

devkec · October 11, 2010, 8:19pm

wow, you are right! this might be a place to lookup too ;)

Lev · October 11, 2010, 8:29pm

Option 3 is to avoid, cause of multiplication.

Lev · October 11, 2010, 8:29pm

Option 3 is to avoid, cause of multiplication.