[BUG] nvopencc: incorrect register class for operand 0

Hi forum,

I’ve been stuck on a bug with the nvcc-compiler. I’ve tried to reduce the problem and attached a sample kernel.

When compiling this using ‘nvcc’ (without options) I get the following error:

### Assertion failure at line 1923 of ../../be/cg/cgemit.cxx:

### Compiler Error in file /tmp/tmpxft_00003f79_00000000-7_bug.cpp3.i during Assembly phase:

### incorrect register class for operand 0

nvopencc INTERNAL ERROR: /opt/cuda/open64/lib//be returned non-zero status 1

The code looks like this:

#include <stdlib.h>

#include <stdint.h>

#define TX blockIdx.x * (blockDim.x * blockDim.y) + (blockDim.y * threadIdx.x) + threadIdx.y

#define l2n(l,c)        (*(c)=(unsigned char)(((l)>>24L)&0xff), \

                         *(c+1)=(unsigned char)(((l)>>16L)&0xff), \

                         *(c+2)=(unsigned char)(((l)>> 8L)&0xff), \

                         *(c+3)=(unsigned char)(((l)     )&0xff))

__global__ void demo_kernel(uint64_t *data) {

	

	uint32_t x2,x4,l0,l1;

	//l0=l1=0;

	uint64_t block = 0;

	block = data[TX];

	l2n(l0,(unsigned char *)&block);

	l2n(l1,((unsigned char *)&block)+4);

	data[TX] = block;

	

}

int main(int argc, char **argv) {

	return 0;

}

This problem vanishes if you uncomment the l0=l1=0 line, unfortunately this does not not work for my original kernel, which is why I haven’t found a workaround yet, despite desperate attempts External Image

I was able to produce this problem on my workstation and laptop:

Workstation: Gentoo Linux, CUDA-SDK 3.2 and CUDA-Toolkit 3.2, gcc 4.5.1. Intel Pentium D (x86_64), 3GB RAM, Geforce 8600 GT (CC 1.1)

Laptop: Mac OS 10.6, CUDA-SDK 3.2 and CUDA-Toolkit 3.2, gcc 4.2.1, Intel Core 2 Duo, 2GB RAM, Geforce 9400M (CC 1.1)

I found this problem while working on a bigger kernel, which I don’t want to post since it’s still very convoluted and shows my inexperience with CUDA External Image

Thank you for bringing this issue to our attention, and for providing a standalone repro case. I was able to reproduce the issue on 64-bit Linux (RHEL) with the CUDA 3.2 toolchain, and have filed a compiler bug. As you already noticed, the way to avoid the internal compiler assertion is to ensure variables are initialized prior to use.

~/[...]/toolkit/r3.2/[...] $ nvcc -o assert_be assert_be.cu

assert_be.cu(15): warning: variable "l0" is used before its value is set

assert_be.cu(16): warning: variable "l1" is used before its value is set

assert_be.cu(15): warning: variable "__cuda_local_var_23884_14_non_const_l0" is used before its value is set

assert_be.cu(16): warning: variable "__cuda_local_var_23884_17_non_const_l1" is used before its value is set

assert_be.cu(15): warning: variable "l0" is used before its value is set

assert_be.cu(16): warning: variable "l1" is used before its value is set

### Assertion failure at line 1923 of ../../be/cg/cgemit.cxx:

### Compiler Error in file /tmp/tmpxft_00003fb9_00000000-7_assert_be.cpp3.i during Assembly phase:

### incorrect register class for operand 0