Pointer address inconsistency cuda 4.0 printf of memory address differs from block to block

Hi there.

I have a kernel running on a C2050 Tesla card that spawns N blocks with Z threads each to handle NZK elements of data, and I set up my data granularity so that each block handles K elements of data. Furthermore, if base_addr is the address for the first element in memory of NZK element vector, then each block is given the following address: (base_addr+blockIdx.xZK).

In order to handle data in this fashion I have the following global kernel that calls another device kernel.

__global__ void BC(uint4 *Lr,uint4 *Lq,uint4 *Pi,unsigned int *word,unsigned int Niter){

	

	register unsigned int tid=threadIdx.x;

	register int i,j;

	

	if(threadIdx.x==0)

		printf("%u: %u %u\n",blockIdx.x,Lq,Lr);

	

	for(j=0;j<Niter;j++)

	{

		for(i=0;i<Mf;i++){

			Kernel1((Lr+SLR*blockIdx.x),(Lq+SLQ*blockIdx.x),tid+i*ZF);

		__syncthreads();

		}

		for(i=0;i<Nf;i++){

			Kernel2((Lr+SLR*blockIdx.x),(Lq+SLQ*blockIdx.x),(Pi+blockIdx.x*N),(word+blockIdx.x*N),tid+i*ZF);

		__syncthreads();

		}

	}

}

Basically SLR is ZK for the Lr vector and SLQ is ZK for the Lq vector, although it is not exactly the same since each block iterates through Mf and Nf times through their respective data within the contiguous vectors Lr and Lq (this is quite irrelevant actually).

The printf output I get is not consistent:

4: 1790005706 13369344

0: 2173406091 13369344

3: 2173406091 13369344

1: 2173406091 13369344

5: 4285003268 13369344

2: 2173406091 13369344

The Lr address is printed as it should, a constant value, but Lq prints different addresses by different blocks that prints it (the differences are not consistent, blocks 0,1,2,3 can too differ in their printed address).

I checked for any memory inconsistencies, running cuda-memcheck, and 0 errors were reported.

The above is the output in CUDA 4.0, although when I run it in CUDA 3.2 I get this:

5: 3722304989 5111808

4: 3722304989 5111808

0: 3722304989 5111808

1: 3722304989 5111808

3: 3722304989 5111808

2: 3722304989 5111808

Pointer address is consistent throughout each block!

Any help would be appreciated. External Image External Image

Thank you. External Image

Even though this looks like a genuine bug: What happens if you print using

printf("%u: %lu %lu\n",(unsigned int)blockIdx.x, (unsigned long)Lq, (unsigned long)Lr);

This happened with 5 blocks:

4: 18446744069423628288 14468034567615334600

1: 14178673872970842112 12225489209634957737

3: 18446744069423628288 17072280789786094828

2: 18446744069423628288 14033993530586874562

0: 11719107997048897536 11791448172606497699

In the first post I was getting inconsistent pointer addresses by the printf and inconsistent results. Now I get inconsistent pointer addresses but results are OK (although that had to do with the host portion of the code).

This is kind of weird, right?