As you all know minimising reg usage is important to have more threads executing simultaneously. It’s more important when kernel algorithm become a bit complex.
I found some little tricks to minimise them (without changing the algorithm), and would like everyone interested to participate and give the tricks he has found.
Thanks to some tricks I reduced from 29 to 19 the nb or register (given in the .cubin file) of my kernel
Here are some of mine :
First : the one with whidh I gained the most :
Operators must have been assigned before recurrent operations : this prevent the compiler from choosing different register when only two or three were enough.
(I don’t know if the register declaration helpf, but the operands declaration yes)
register float opRF;
register float opFilter;
RFData = texfetch(texRFData_0, decal,yTex); opRF = RFData.x; opFilter = Fval.x; resX += opRF * opFilter; opRF = RFData.y; opFilter = Fval.y; resX += opRF * opFilter; opRF = RFData.z; opFilter = Fval.z; resX += opRF * opFilter; opRF = RFData.w; opFilter = Fval.w; resX += opRF * opFilter;
instead of :
RFData = texfetch(texRFData_0, decal,yTex); resX += RFData.x * Fval.x; resX += RFData.y * Fval.y; resX += RFData.z * Fval.z; resX += RFData.w * Fval.w;
Second : Minimise the argument numbers in the kernel call : make public constants when possible.
That’s all from now, do you have some tricks to share ?