compiler bug?

The following code is supposed to print 10 1’s, while it actually prints 0’s. Tried both CUDA 2.3 and 3.0 beta, same problem. I think this is part of “class inheritance support”, which CUDA 3.0 declares to have.

Calling B::get() should return B::get_i(), but A::get_i() is returned in NVCC.

I think it is a name lookup problem.

Hardware: GTX280

OS: Redhat Enterprise Linux 5.3, x86

CUDA: 2.3 and 3.0

#include <cuda.h>

#include <stdio.h>

const int N=10;

struct A{

		__device__ int get(){

				return get_i();

		};

		__device__ int get_i(){return 0;};

};

struct B:public A{

		__device__ int get_i(){

				return 1;

		}

};

__global__ void foo(int * arr){

		B b;

		arr[threadIdx.x]=b.get();

}

int main(){

		int * arr;

		cudaMalloc((void**)&arr,sizeof(int)*N);

		foo<<<1,N>>>(arr);

		int * temp=(int*)malloc(sizeof(int)*N);

		cudaMemcpy(temp,arr,sizeof(int)*N,cudaMemcpyDeviceToHost);

		for(int i=0;i<N;i++)

				printf("%d\t",temp[i]);

		printf("\n");

		return 0;

}