I recently added BIGint arbitrary precision class to my gxLibrary ( https://sourceforge.net/projects/gxlibrary ) , which allows work with large integers on CUDA (and AMP and CPU too), like:
intB<128> A=7, B=99, C=A/B;
A>>=1; B++;
Internally intB class has defined constant integer which represent how many 32bit unsigned ints are used. Something like:
static const int N= Nbits/32;
When I decided to optimize some operations for low Ns (for example N==4 for 128bit, or N==3 for 96 bit), I used something like this in code:
intB& operator++(){
switch (N){
case 4: if (!++d[0]) if (!++d[1]) if (!++d[2]) ++d[3]; break;
case 3: if (!++d[0]) if (!++d[1]) ++d[2]; break;
case 2: if (!++d[0]) ++d[1]; break;
case 1: ++d[0]; break;
default:
for (int i=4; i<N; i++){
++d[i];
if (d[i]) break;
}
}
return *this;
}
Since ‘N’ used above is constant (templates are defined at compile time), compilers should remove any unneeded code paths already at compile time, so switch(N) actually become just part of code for given N - and it works exactly like that when compiled for CPU or AMP ( gxLibrary compile code for all three: CUDA/AMP/CPU )
But in case of CUDA compiler, it appears not to recognize that N is constant, since it is giving multiple “subscript out of range” warnings (code has d[3], even when N==2, but that part where d[3] is used should have been eliminated at compile time ).
While I could ignore warnings, my main question is if those are only warnings , or CUDA compiler also failed to remove unneeded code paths and left those “if (N==xyz)” comparisons or “switch(N)” code ? In which case it would also have slight performance impact.