Hi there.
I have a kernel running on a C2050 Tesla card that spawns N blocks with Z threads each to handle NZK elements of data, and I set up my data granularity so that each block handles K elements of data. Furthermore, if base_addr is the address for the first element in memory of NZK element vector, then each block is given the following address: (base_addr+blockIdx.xZK).
In order to handle data in this fashion I have the following global kernel that calls another device kernel.
__global__ void BC(uint4 *Lr,uint4 *Lq,uint4 *Pi,unsigned int *word,unsigned int Niter){
register unsigned int tid=threadIdx.x;
register int i,j;
if(threadIdx.x==0)
printf("%u: %u %u\n",blockIdx.x,Lq,Lr);
for(j=0;j<Niter;j++)
{
for(i=0;i<Mf;i++){
Kernel1((Lr+SLR*blockIdx.x),(Lq+SLQ*blockIdx.x),tid+i*ZF);
__syncthreads();
}
for(i=0;i<Nf;i++){
Kernel2((Lr+SLR*blockIdx.x),(Lq+SLQ*blockIdx.x),(Pi+blockIdx.x*N),(word+blockIdx.x*N),tid+i*ZF);
__syncthreads();
}
}
}
Basically SLR is ZK for the Lr vector and SLQ is ZK for the Lq vector, although it is not exactly the same since each block iterates through Mf and Nf times through their respective data within the contiguous vectors Lr and Lq (this is quite irrelevant actually).
The printf output I get is not consistent:
4: 1790005706 13369344
0: 2173406091 13369344
3: 2173406091 13369344
1: 2173406091 13369344
5: 4285003268 13369344
2: 2173406091 13369344
The Lr address is printed as it should, a constant value, but Lq prints different addresses by different blocks that prints it (the differences are not consistent, blocks 0,1,2,3 can too differ in their printed address).
I checked for any memory inconsistencies, running cuda-memcheck, and 0 errors were reported.
The above is the output in CUDA 4.0, although when I run it in CUDA 3.2 I get this:
5: 3722304989 5111808
4: 3722304989 5111808
0: 3722304989 5111808
1: 3722304989 5111808
3: 3722304989 5111808
2: 3722304989 5111808
Pointer address is consistent throughout each block!
Any help would be appreciated. External Image External Image
Thank you. External Image