4 variables in one register: how to implement it?

I have a question - is there a simple way to pack 4 unsigned char-s into one register?

If I was coding on CPU I would, for example, code it by bit shifts and masks:
var1 would be (reg1 & 0xff)
var2 - ((reg1 >> 8) & 0xff)
var3 - ((reg1 >> 16) & 0xff)
var4 - ((reg1 >> 24) & 0xff)

reg1 is declared as unsigned int.

However when I did that in my, otherwise working, code - it crashes!

Maybe you have an idea what is going wrong?
Or maybe another way to pack 4 such variables into one register?

Register count is currently my only limiting facter for one of my slow, memory-bound kernels and want to increase occupancy of it to (hopefully) better hide latencies.

I use one register to pack two, but it will probably work for you as well…

int iCalculatedPositions = <some value 1>

	iCalculatedPositions <<= 16;	// Shift the data to the high bits and then add MoveSample to the low bits.

	iCalculatedPositions += <some value 2>

	// pCalculatedNb is a int * parameter...

	pCalculatedNb[ iOutputPos ] = iCalculatedPositions;

edit: forgot to write how to resolve it :)

int iCalculatedPositions = pCalculatedNb[ iCalculatedInputPos ];

   smnumMoveSamples[ threadIdx.x ] = iCalculatedPositions & 0x0000FFFF;

   smNb[ threadIdx.x ] = ( iCalculatedPositions & 0xFFFF0000 ) >> 16;

BTW - one other way to save regs is use shared memory. If you can spare a certain amount of shared memory you

can put data in the shared memory (one item per thread) instead of registers.

eyal

I don’t have much shared memory left. I do similar stuff as you do, for some reason it is not working in my case :(

Maybe I copy here all relevant code.

Old version:

int var1=0;

int var2=0;

int var3=0;

if (condition1) ++var1;

else if (condition2) ++var2;

else ++var3;

[...]

float cost=alpha*(var1+var3)+beta*(var2+var3)+epsilon;

[...]

globalArray[idx].v1=var1;

globalArray[idx].v2=var2;

globalArray[idx].v3=var3;

new version:

unsigned int var;

if (condition1) ++var1;

else if (condition2) var2+=0x100;

else var3+=0x10000;

[...]

float cost=alpha*( (var&0xff) +( (var>>16)&0xff) )+beta*( ((var>>8)&0xff) + ((var>>16)&0xff) )+epsilon;

[...]

globalArray[idx].v1=(var&0xff);

globalArray[idx].v2=((var>>8)&0xff);

globalArray[idx].v3=((var>>16)&0xff);

Rest of code does not depend directly on var-s.

Incrementation part is in a loop which never takes more than 32 steps.

Edit: corrected stupid mistake as eyalhir74 pointed below. Problem remains though…

I think your new code should be like this, no? you should reset var and only use var and not var1,2,3.

unsigned int var = 0;

if (condition1) ++var;

else if (condition2) var+=0x100;

else var+=0x10000;

[...]

float cost=alpha*( (var&0xff) +( (var>>16)&0xff) )+beta*( ((var>>8)&0xff) + ((var>>16)&0xff) )+epsilon;

[...]

globalArray[idx].v1=(var&0xff);

globalArray[idx].v2=((var>>8)&0xff);

globalArray[idx].v3=((var>>16)&0xff);

Ah, that is a copy-paste typo :)
Of course I don’t use var2, var3 etc - that would cause compile error in the first place.

Correction: my problem -was- a copy paste. Apparently I did use one of those unused variables in my code, which was declared but never used otherwise.
Warning message got lost among lots of other warnings of type ‘Advisory, cannot tell where the pointer points to…’

Thank you

:) copy-paste is a great invention ;)