Hi forum,
I’ve been stuck on a bug with the nvcc-compiler. I’ve tried to reduce the problem and attached a sample kernel.
When compiling this using ‘nvcc’ (without options) I get the following error:
### Assertion failure at line 1923 of ../../be/cg/cgemit.cxx:
### Compiler Error in file /tmp/tmpxft_00003f79_00000000-7_bug.cpp3.i during Assembly phase:
### incorrect register class for operand 0
nvopencc INTERNAL ERROR: /opt/cuda/open64/lib//be returned non-zero status 1
The code looks like this:
#include <stdlib.h>
#include <stdint.h>
#define TX blockIdx.x * (blockDim.x * blockDim.y) + (blockDim.y * threadIdx.x) + threadIdx.y
#define l2n(l,c) (*(c)=(unsigned char)(((l)>>24L)&0xff), \
*(c+1)=(unsigned char)(((l)>>16L)&0xff), \
*(c+2)=(unsigned char)(((l)>> 8L)&0xff), \
*(c+3)=(unsigned char)(((l) )&0xff))
__global__ void demo_kernel(uint64_t *data) {
uint32_t x2,x4,l0,l1;
//l0=l1=0;
uint64_t block = 0;
block = data[TX];
l2n(l0,(unsigned char *)&block);
l2n(l1,((unsigned char *)&block)+4);
data[TX] = block;
}
int main(int argc, char **argv) {
return 0;
}
This problem vanishes if you uncomment the l0=l1=0 line, unfortunately this does not not work for my original kernel, which is why I haven’t found a workaround yet, despite desperate attempts External Image
I was able to produce this problem on my workstation and laptop:
Workstation: Gentoo Linux, CUDA-SDK 3.2 and CUDA-Toolkit 3.2, gcc 4.5.1. Intel Pentium D (x86_64), 3GB RAM, Geforce 8600 GT (CC 1.1)
Laptop: Mac OS 10.6, CUDA-SDK 3.2 and CUDA-Toolkit 3.2, gcc 4.2.1, Intel Core 2 Duo, 2GB RAM, Geforce 9400M (CC 1.1)
I found this problem while working on a bigger kernel, which I don’t want to post since it’s still very convoluted and shows my inexperience with CUDA External Image