NVCC: "Assertion failure..."

I ran into the following assertion error when I tried to compile my code:

Assertion failure at line 1543 of …/…/be/cg/cgemit.cxx:

Compiler Error in file /tmp/tmpxft_00002b38_00000000-7_testPA.cpp3.i during Assembly phase:

incorrect register class for operand 0

nvopencc INTERNAL ERROR: /usr/local/cuda/open64/lib//be returned non-zero status 1

I tried both NVCC 2.1 for Fedora9_x86_64 and EL5_x86, both with the same error.
Compile option:
NVCC testPA.cu -arch=compute_13

The code is as follows:

//==========================
// PA.h
#ifndef PA_H
#define PA_H

#include <cuda.h>

namespace MACU{
typedef unsigned int uint32;
typedef int int32;
typedef unsigned long long uint64;

struct __align__(8) BlockListHeadDesc{ 
	int32 first_avail:32;	
	uint64 tag:32;	
};

struct BlockList{
	BlockListHeadDesc head_desc;
            __device__ int32 pop();        
};

struct Superblock{
	BlockList avail_list;	
	__device__ void * get_block();		
};
__device__ int32 BlockList::pop(){
	BlockListHeadDesc new_desc,old_desc;
	while(1){
		new_desc=old_desc=head_desc;
		if(old_desc.first_avail==-1)
			return -1;	
		new_desc.tag++;
		if(*(uint64*)&old_desc==atomicCAS((uint64*)&head_desc,*(uint64*)&old_desc,*(uint64*)&new_desc))
			return 0;
	}
}
__device__ void * Superblock::get_block(){
	while(1){
		uint32 block_num=avail_list.pop();
		return NULL;
	}	
}

}
#endif

//=============================
// testPA.cu
#include “PA.h”
#include <cuda.h>
#include <stdio.h>

using namespace MACU;

global void foo(int * arr){
Superblock * sb=(Superblock *)(NULL);
void * p=sb->get_block();
}
int main(){

return 0;

}
PA.h (905 Bytes)

I’ve reproduced this assertion, and opened bug 529788 to have it investigated further. Thanks for reporting this.

Note that the problem only occurs because the bitfield is declared uint64 but is 32 bits in size, so as a workaround one could declare the bitfield uint32 and the problem would go away.

This is still a bug, just suggesting a way to proceed that would work.

Thanks for the suggestion.

I actually wanted to use more bits for the filed tag, that’s why I use uint64 instead of uint32.

Thanks, netlama.

In which version of the compiler will this bug fix be applied? Do I have to wait till the next major release, say CUDA 2.2, or is there some kind of minor release that I can get?

At the earliest, it would be CUDA_2.2.