# cuda kernel just works for a specific maximum number of elelmentsarray o N elements

Hi,

when trying to find a specific number in an array (smallest) it just works for several array lengths and not for all ones.

For example, if N=3560 or less it works but if N=4560 or bigger it does not works…

Is there any limits i am not respecting?

Note: I get the return (minor number) on the first element of the array, on exit…

``````const int N=4560;

__global__ void minor(int *array)

{

int tid=threadIdx.x + (blockIdx.x * blockDim.x);

int nMiddle=(N/2)+1;

while(nMiddle > 0)

{

if((tid < nMiddle) && (tid+nMiddle < N))

{

if(array[tid+nMiddle] < array[tid])

{

array[tid]=array[tid+nMiddle];

}

}

if(nMiddle >= 3)

nMiddle=(nMiddle/2) + 1;

else

nMiddle=(nMiddle/2);

}

}

...

for (int i=0; i<N; i++)

{

a[i] = N-i;

printf("%d ",N-i);

}

ret=cudaMemcpy(dev_a,a,N*sizeof(int),cudaMemcpyHostToDevice);

if(ret != cudaSuccess)

{

printf("Error cudaMemcpy 1\n");

exit(1);

}

...
``````

Env: windows 7, NVIdia QUADRO FX580

Thanks

JoseBB

[font=“Courier New”]__syncthreads()[/font] synchronizes only within a block, not between blocks, so your algorithm does not work in general.

Look at the reduction example in the SDK for how to implement this properly.

I assume your code is really meant to sort arrays, not just find the minimum.

What happens between N= 3560 and 4560?

When N= 3560 it says that 1 is the minimum. It is right because I initialize the array from 1 to N.
When N= 4560 or more it says that the minimum is 1250. I print the whole returned array and the 1250 is in the first element.

Yes, the algorithm sorts ascendly and then I print the array[0] as the minimum value on the array…

Well, If __syncthreads() just syncs inside each block that may be the problem… May be i have to use a shared variable in each block and each block computes its minimum and then the CPU gets the minim of all returned minimus, right?

JoseBB