First of all my apologies if this ends up as a rather trivial error in reasoning on my part. I am relatively new to CUDA programming and the code where the specific problem lies is not my own, so it’s entirely plausible that this would be the case. But the problem (and especially its solution) I encountered felt significantly weird to me to at least loop this back through the forums.
I am in the process of migrating a framework from CUDA 2.1 to 2.3. While doing this, I ran into a few expand.cxx problems which I mostly could quickly track down to failures of initialization on the framework’s part. However, one problem remained.
The problem that occured gave the following error message:
Assertion failure at line 123 of …/…/be/cg/NVISA/expand.cxx:
Compiler Error in file C:…/tmpxft_00000de0_00000000-9_CUDAraytraceHost.cpp3.i during Code_Expansion phase:
unexpected mtype
nvopencc ERROR: C:CUDAbin/…/open64/lib//be.exe returned non-zero status 1
I managed to track it down to the traverseGrid function in the following code;
hitrec.inShadow = -1;
tinfo.cont = 0;
do
{
if(!tinfo.cont)
{
org = findRayOrgInGrid(ray, id);
tinfo = setupGridTraverse(ray, org);
}
// CUDA 2.3 fail!!!
tinfo = traverseGrid(tinfo, id);
hitrec.inShadow = shadowRayIntersection(&ray, tinfo, id, mag);
if(hitrec.inShadow == -1)
tinfo.cont = 1;
}while(hitrec.inShadow == -1 );
And specifically to the following fragment in this function:
if(tinfo.d.x <= tinfo.d.y && tinfo.d.x <= tinfo.d.z)
{
if(tinfo.d.x >= tinfo.s.x)
pastGrid = true;
else
{
// The following line causes the expand.cxx error in CUDA 2.3!!
tinfo.d.x += tinfo.delta.x;
voxel.x += tinfo.step.x;
}
}
Having tracked the error down to tinfo.delta.x, I assumed it would be a simple case of an uninitializated variable. Hence I looked into the function initializing this variable.
This didn’t help me find a consistent solution, but did help me realize something else. The code concerning the delta.x variable was completely symmetric and modular with regard to the code for delta.y and delta.z, however these variables did not cause the same error as delta.x caused (I tested this thoroughly). When going through the code, I expected there to be a difference somewhere (maybe a swapped x/y/z), but I was left with no indication why this error would be limited to delta.x.
I took a quick glance at the structure definition, which is as follows:
struct __align__(16) traverseInfo{
float4 delta;
float4 d;
float4 s;
int4 step;
int4 voxel;
int4 thisVoxel;
int cont;
};
Because the error only occured with delta’s first x variable, I got a sneaking suspicion that it might be caused by something that, at least in my eyes, shouldn’t cause it.
So I switched the order of float4 delta; and float4 d; and suddenly everything was fixed. Since d.x was initialized in all cases, I tried to reproduce the error by taking initialization of d.x away in one clause, which caused the error to return, while the same action with d.y and d.z (again, symmetric/modular code) would not cause this error. I’ve tested if I could remove the error as well by trying to make sure delta.x would be initialized, with which I received mixed results, while simply changing the order of delta/d in the traverseInfo structure consistently fixed the problem (which wouldn’t be explained by delta.x consistently working when initialized, anyway)
So … maybe I’m missing something really trivial about compiling CUDA code … or even with regard to structs (e.g. I am not familiar enough with struct alignment to exclude it as a cause)… in which case my apologies for wasting time. However, I thought it wise to run this by you guys to see what you think about this.
Kind regards,
Carlo Vloet
P.S. by taking findRayOrgInGrid and setupGridTraverse out of its if-clause, the error will also be removed. However, this makes my program fall flat on its face, so it is not an option. Furthermore, it doesn’t explain why this error is limited to usage of delta.x when delta.y/delta.z’s code is completely similar and independant, and why reordering the structure solves this problem as well (without making my program fall flat on its face).
-
Operating System: Windows 7 Professional 64-bit
-
CUDA toolkit release version: 2.3 32-bit (program has been developed mostly on 2.1 32-bit … slowly transitioning towards 2.3 64-bit)
-
SDK release version: 2.3 32-bit (no completely certain since my installs have gotten rather convoluted … will install everything fresh soon and try to reproduce the error)
-
Compiler for CPU host code: Visual Studio 2008 C/C++ compiler
-
System description including
CPU type/speed: AMD Athlon 64 X2 Dual Core Processor 3800+ 2.01 Ghz
Installed RAM: 2.00 GB
system type/model: 64 bit/?
video cards installed in the system: Geforce 8800 GTS 512 mb (provided for research by Nvidia Developer Technology Tools - Program Manager, big thanks!!)
chipset type: G92
Motherboard: Asus A7n8x-E