Strange behavior compiling an expression with powf() ?!


The following code below produces a strange behavior during compilation time on a GeForce 9400M.
With t1 and t2a it compiles, BUT if t1 is substituted into t2a -> t2b breaks.

For other expressions like this, it produces the same behavior when powf() is involved.
Why? And how it could be corrected?

(Because the expression and the kernel itself is generated by a host logic, it would be really nice if it could compile in one line, regardless to performance).

Thank you!

__global__ void mykernel (float *decision, float *data, int L, int K) {
		        int idx = threadIdx.x + threadIdx.y * 32 + blockIdx.x * blockDim.x;
		        float r47 = data[idx + 47 * L];
		        float r33 = data[idx + 33 * L];
		        float r36 = data[idx + 36 * L];
		        float r20 = data[idx + 20 * L];
		        if (idx < L) {
				float t1 = powf(fdividef(r33, r47), 3.0f);
				float t2a = sinf(((sqrtf(r36) > cosf(r20)) ? sqrtf(r36) : cosf(r20)) or t3 or 0.0f);
				//float t2b = sinf(((sqrtf(r36) > cosf(r20)) ? sqrtf(r36) : cosf(r20)) or powf(float(r33/r47), 3.0f) or 0.0f);

nvcc options:

nvcc --cubin --maxrregcount=16 -arch sm_11 -m64 -I....../cuda

What it says during compilation is:

### Assertion failure at line 6244 of ../../be/cg/whirl2ops.cxx:
### Compiler Error in file /tmp/tmpxft_0000d781_00000000-9_kernel.cpp3.i during Code_Expansion phase:
### asm kid not an asm_input?
nvopencc INTERNAL ERROR: /usr/local/cuda/bin//../open64/lib//be returned non-zero status 1

The compiler is:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2011 NVIDIA Corporation
Built on Fri_Jan_13_01:52:47_PST_2012
Cuda compilation tools, release 4.1, V0.2.1221

This kind of an assertion failure is an internal compiler error, that is, a bug in the compiler. There is nothing you can do about it. You are using an ancient version of CUDA and therefore an ancient compiler. Consider upgrading to a newer version, this will likely fix your problem. Note, however, that sm_11 devices are no longer supported by the latest versions of CUDA, nor are they supported in recent driver packages (this applies to both graphics and CUDA drivers).

I am not entirely sure (so double check before preceding) but I think CUDA 6.0 was the last version to support all of the sm_1x devices. Older CUDA versions can be retrieved from the archive:

CUDA 6.0 supports cc1.0 - cc1.3
CUDA 6.5 supports cc1.1 - cc1.3, so it should be usable with sm_11 compilation. (although a deprecation message will appear)
CUDA 7.0 and beyond do not support these GPUs (cc1.x)

It looks like the 5.0 toolkit still doesn’t solve the problem, however the notebook and the 10.6 OSX on it is also ancient for >5.0 toolkits.
Well the target will be a Jetson TK-1 when it comes to the point, but nah…

It is possible that you are hitting an issue that was never reported to NVIDIA while sm_1x was still supported and that the problem persisted through various later versions. Your best bet for a zero-cost fix is to try the latest CUDA version that supports sm_11 (see the information provided by txbob above), or to try and work around the problem with source code changes.

Since your ultimate target is a Jetson TK-1, it could also be a good idea to acquire that platform right away, then switch to the latest CUDA version and never look back. Dealing with outdated hardware and software that is no longer supported easily makes for a frustrating experience.

I can’t compile the code you have shown, even after removing the …

So I’m reasonably certain that is not the code that is causing the issue. It may be something like the code, I don’t know.

There have been compiler bugs in the past that were caused by uninitialized memory / dangling pointers, and those had a tendency to be hard to reproduce since they only triggered if the memory location in question had the “right” contents. Impossible to say whether this could be the case here.

The bug may also be coupled to certain optimization phases, so it may make sense to try -Xopencc -O{1|2} as a possible workaround (-O3 is the default for Open64).

I owe you one guys, -O1 resolved my issue.

I owe you one guys, -O1 resolved my issue.