I wanted to make a simple code for detecting the sign change for my CUDA kernel. To avoid flow branches I did this only with comparisons and boolean logic operations. Here is my code:
__host__ __device__
bool SignChange( const float & a, const float & b )
{
/*
0 a < 0
1 a > 0
2 0 ^ 1
3 b < 0
4 b > 0
5 3 ^ 4
(0 ^ 3)
|| (1 ^ 4)
|| (2 ^ 5)
*/
bool d0 = a < 0;
bool d1 = a > 0;
bool d2 = d0 ^ d1;
bool d3 = b < 0;
bool d4 = b > 0;
bool d5 = d3 ^ d4;
return (d0 ^ d3) || (d1 ^ d4) || (d2 ^ d5);
}
If the code isn’t used, it is ignored and the whole program compiles fine. If, however, I try to call that function, the following happens:
### Assertion failure at line 106 of ../../be/cg/NVISA/expand.cxx:
### Compiler Error in file keep/kernel.cpp3.i during Code_Expansion phase:
### unexpected mtype
nvopencc INTERNAL ERROR: /home/tener/localopt/cuda/open64/lib//be returned non-zero status 1
nvcc is:
nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2010 NVIDIA Corporation
Built on Wed_Nov__3_16:16:57_PDT_2010
Cuda compilation tools, release 3.2, V0.2.1221
The actual component that failed is nvopencc:
/home/tener/localopt/cuda/open64/bin/nvopencc -TARG:compute_20 -m64 -CG:ftz=1 -CG:prec_div=1 -CG:prec_sqrt=1 "keep/kernel" "keep/kernel.cpp3.i" -o "keep/kernel.ptx"
### Assertion failure at line 106 of ../../be/cg/NVISA/expand.cxx:
### Compiler Error in file keep/kernel.cpp3.i during Code_Expansion phase:
### unexpected mtype
nvopencc INTERNAL ERROR: /home/tener/localopt/cuda/open64/lib//be returned non-zero status 1
Version:
NVIDIA (R) CUDA Open64 Compiler
Cuda compilation tools, release 3.2, V0.2.1221
Built on 2010-11-03
Portions Copyright (c) 2005-2010 NVIDIA Corporation
Portions Copyright (c) 2002-2005 PathScale, Inc.
Portions Copyright (c) 2000-2001 Silicon Graphics, Inc.
All Rights Reserved.
I attach relevant file kernel.cpp3.i. For the faulty code see the line 6026. The function in question is inlined there.
__cuda_local_var_149505_10_non_const_sign_has_changed = (((((((__T261 = ((char)((*((const float *)(&__cuda_local_var_149504_11_non_const_val))) < (0.0F)))) , ((void)(__T262 = ((char)((*((const float *)(&__cuda_local_var_149504_11_non_const_val))) > (0.0F)))))) , ((void)(__T263 = ((char)((((int)__T261) ^ ((int)__T262)) != 0))))) , ((void)(__T264 = ((char)((*((const float *)(&__cuda_local_var_149512_9_non_const_tmp))) < (0.0F)))))) , ((void)(__T265 = ((char)((*((const float *)(&__cuda_local_var_149512_9_non_const_tmp))) > (0.0F)))))) , ((void)(__T266 = ((char)((((int)__T264) ^ ((int)__T265)) != 0))))) , ((char)(((((int)__T261) ^ ((int)__T264)) || (((int)__T262) ^ ((int)__T265))) || (((int)__T263) ^ ((int)__T266)))));
Please tell me if you need more information.
kernel.cpp3.i.txt (263 KB)