Platform: Windows XP 32bit
CUDA 2.3 Error:
1>### Assertion failure at line 123 of ../../be/cg/NVISA/expand.cxx:
1>### Compiler Error in file blah.cpp3.i during Code_Expansion phase:
1>### unexpected mtype
1>nvopencc ERROR: <snip>\thirdparty/cudatoolkit2.2/bin/win_ia32/open64/lib//be.exe returned non-zero status 1
CUDA 2.2 Error: ptxas simply runs in an infinite loop (waited for 20+ minutes, it ended up getting to about 3.5gig memory usage so I had to kill it)
I managed to get this error when I started to add IEEE compliant paths to all of my mathematical functions (so I could switch between fast math, and correct math with a simple macro switch).
The only difference between the working code, and the code that screws with nvcc/ptxas - is simply the use of the IEEE floating point intrinsics (__fmaf_rn, __fsqrt_rn, __fmul_ru, etc).
Unfortunately I can’t post blah.cpp2.i here (a: because I can’t find it, even after --keep’ing the files, and b: if it’s anything like the other .i/.ii/.gpu/etc files, it has about 3k lines worth of proprietary, copyrighted, and patented source code of which I’m not authorized to reveal).
Instead I’ve attached the mathematical functions, that use the IEEE intrinsics which appear to cause the error. I should note though, I’m not yet able to reproduce the same error in a smaller/simpler kernel that causes the same bug in the nvcc toolchain…
Any help would be appreciated…
[s]Edit: After a day work on this, I’ve somehow managed to get the same error without using any intrinsicts - so it’s not related to the intrinsics at all it would seem (I just got unlucky seeing this error for the first time when I started using the IEEE intrinsics).
After some googling I came across a similar reference to this same error (probably would’ve found it via the forum search… if it worked) in which the respondents came to the conclusion the error is in regards to writing an uninitialized register into memory (global/shared) - though after going over my code rather thoroughly, initializing ‘everything’ after initialization - in addition to the initializations I do later on in the code, it hasn’t solved my problem…[/s]
[b]Final Edit: It seems it was indeed an uninitialized variable (that I missed in my initial sweep), same problem as this post: http://forums.nvidia.com/index.php?showtop…rt=#entry575590
Again, as suggested in that post - a more relevant error would save a lot of lost time in this case…[/b]
bug.txt (2.21 KB)