Duplicate an array into a big array

onepieceking · August 26, 2008, 6:16am

Hi,

I have an float2 array of size n and another float2 array of size n*m. What I need to do is to fill the second array with the first array.

__global__ void 

cudaCombineCopy_32fc(float *cuMainSrc, float *cuDst, int elementN, int count)

{

    int tid = threadIdx.x;

    int idx = (blockIdx.y * gridDim.x + blockIdx.x) * blockDim.x + threadIdx.x;

    if (idx < elementN * count)

    {

        cuDst[idx] = cuMainSrc[tid];

        __syncthreads();

    }

}

void Main()

{

    cudaCombineCopy_32fc<<<Count, elemenCount>>>

        (InputArray, OutputArray, elementCount, Count);

}

Is this method correct?

Tigga · August 28, 2008, 10:55am

Hi,

I have an float2 array of size n and another float2 array of size n*m. What I need to do is to fill the second array with the first array.

__global__ void 

cudaCombineCopy_32fc(float *cuMainSrc, float *cuDst, int elementN, int count)

{

 Â  Â int tid = threadIdx.x;

 Â  Â int idx = (blockIdx.y * gridDim.x + blockIdx.x) * blockDim.x + threadIdx.x;

 Â  Â if (idx < elementN * count)

 Â  Â {

 Â  Â  Â  Â cuDst[idx] = cuMainSrc[tid];

 Â  Â  Â  Â __syncthreads();

 Â  Â }

}

void Main()

{

 Â  Â cudaCombineCopy_32fc<<<Count, elemenCount>>>

 Â  Â  Â  Â (InputArray, OutputArray, elementCount, Count);

}

Is this method correct?

[snapback]431201[/snapback]

The __syncthreads is both unessesary and unsafe. An individual __syncthreads call must be called by every thread in a block or bad things happen. Having said that, the compiler probably optimises it away.