NVCC bug report: a runtime error

pacard · March 17, 2009, 1:30pm

I have just ran into a bug in NVCC. Tried both NVCC 2.1 on x86 and x86_64, both fail.

Compile and run the following code, and the result is wrong. The output is supposed to be 111. But on x86_64, it gives 0. And on x86, it outputs trash value. Device emulation mode works fine, though.

Compile options:

NVCC test.cu -arch=compute_13

code is as follows:

//==============

// test.cu

#include <cuda.h>

#include <stdio.h>

//==============================

// array based lock-free queue

template <unsigned int N>

class LFQueue{

public:

		__device__ LFQueue():head(0),tail(0),count(0){}

		__device__ void push(void * p){

				unsigned int index=atomicInc(&tail,N-1);	  // get the next index

				array[index]=p;

				atomicInc(&count,N);	// increase the number of available elements

		}

public:

		void * array[N];

		unsigned int head;

		unsigned int tail;

		unsigned int count;

};

__global__ void foo(int * arr){

		LFQueue<2> q;

		int id;

		id=blockIdx.x*blockDim.x+threadIdx.x;

		q.push(NULL);

		arr[id]=111;

}

const int numBlocks=1;

const int numThreads=1;

const int N=numBlocks*numThreads;

int main(){

		cudaSetDevice(0);

		int * h_arr=(int *)malloc(sizeof(int)*N);

		int * d_arr;

		cudaMalloc((void **)&d_arr,sizeof(int)*N);

		foo<<<numBlocks,numThreads>>>(d_arr);

		cudaMemcpy(h_arr,d_arr,sizeof(int)*N,cudaMemcpyDeviceToHost)

;

		for(int i=0;i<N;i++){

				printf("%d\t",h_arr[i]);

				if(i%8==0)

						printf("\n");

		}

		return 0;

}

netllama · March 17, 2009, 1:56pm

What kind of GPU are you using? Which OS is this?

pacard · March 17, 2009, 2:36pm

GPU: GTX280

OS: RHEL 5 x86 and Fedora 9 x86_64

pacard · March 17, 2009, 3:41pm

I have just tried to modify the code and get rid of the templates and class, but the error persists. Could someone tell me how to work around this bug?

Any suggestion is appreciated.

netllama · March 17, 2009, 3:43pm

Thanks. I’ve reproduced this problem and opened bug 533279. I’ll let you know if there’s a workaround.

pacard · March 18, 2009, 1:21am

$ nvcc testQ.cu -arch=compute_20

nvcc fatal : Value ‘compute_20’ is not defined for option ‘gpu-architecture’

Both on EL5 x86 and Fedora 9 x86_64. CUDA version is 2.1.

Am I missing something? Or should I try other version of CUDA?

Thanks very much for your reply.

kfractal · March 18, 2009, 2:15pm

What is going on here is that ‘atomicInc’ works relative to the “global” address space. But the program is written as if it can operate on thread locals (references to ‘&tail’ and ‘&count’ in the storage for ‘q’). As a result these atomic ops are actually stepping on the global address space where ‘d_arr’ is allocated. The compiler can decide how to place ‘q’ and where ‘d_arr’ is allocated can differ between different system configurations. That explains the runtime differences you noticed.

Check out the description for atomicInc in the CUDA 2.1 Reference Manual

Hope that helps External Media

pacard · March 19, 2009, 7:09am

I see. I used a dynamic allocated memory for storing q, and the code runs without error.

Thanks very much!

Topic		Replies	Views
atomicInc not working?!?!? Frustrating problem CUDA Programming and Performance	5	4588	July 6, 2009
Inconsistancy between NVCC and MS-Compiler CUDA Programming and Performance	5	6339	December 10, 2010
nvcc Segmentation Fault (addressing issue with stack vars?) local to global gather CUDA Programming and Performance	8	9986	August 15, 2010
atomicAdd problems. CUDA Programming and Performance	3	2373	April 13, 2011
Atomic operation Getting atomicAdd support CUDA Programming and Performance	3	2937	December 3, 2007
NVCC: "Assertion failure..." CUDA Programming and Performance	5	2349	March 10, 2009
atomic ops problems CUDA Programming and Performance	13	10349	January 22, 2009
CUDA 2.3a/nvcc frustrations CUDA Programming and Performance	13	31032	September 22, 2009
Assertion failure at line 1440 of ../../be/cg/cgem CUDA Programming and Performance	5	4043	August 27, 2007
FC11 x86_64: __sync_fetch_and_add error... compiler error with nvcc... CUDA Programming and Performance	3	5329	November 30, 2009

NVCC bug report: a runtime error

Related topics