While attempting to optimize one of my kernels, I encountered the following assertion failure on most of my test machines (Tesla C1060, GeForce 9800 GT, GTX 285) but was unable to reproduce it on my laptop (GeForce 8600M GT). All systems were running Ubuntu 9.04 amd64 with the 185.18.36 driver and using the CUDA 2.3 compiler.
/home/buildmeister/build/sw/rel/gpu_drv/r185/r185_66/drivers/gpgpu/cuda/src/gpgpucomp/../../../../common/cop/codegen/nv50/cop_nv50_common.cpp:6532: int LoadHighConstantsNV50(LdStruct*, Dag*, void*, int): Assertion `lIndex->arg0.child->GetKind() == DK_VARIABLE' failed.
Original code (works)
if(xlen >= 0) istrcat_fast_inner(dest0, dest1, src0); \
if(xlen >= 1) istrcat_fast_inner(dest1, dest2, src1); \
if(xlen >= 2) istrcat_fast_inner(dest2, dest3, src2); \
if(xlen >= 3) istrcat_fast_inner(dest3, dest4, src3); \
if(xlen >= 4) istrcat_fast_inner(dest4, dest5, src4); \
if(xlen >= 5) istrcat_fast_inner(dest5, dest6, src5); \
if(xlen >= 6) istrcat_fast_inner(dest6, dest7, src6); \
if(xlen >= 7) istrcat_fast_inner(dest7, dest8, src7); \
if(xlen >= 8) istrcat_fast_inner(dest8, dest8, src8); \
if(xlen >= 9) istrcat_fast_inner(dest9, dest10, src9); \
if(xlen >= 10) istrcat_fast_inner(dest10, dest10, src10);
My optimized code turned these if statement into something like this:
switch(xlen)
{
case 0:
/* >=0 code */
break;
case 1:
/* >=0 code */
/* >=1 code */
break;
/* etc */
}
if it’s of any interest, this entire block was repeated inside every case of a switch statement which was inside a macro.