Im working in the kernel with an array to get the results, but when the array is too long the app works to slow (again).
When this array have a length greater than 340000 elements, the gpu take a long time to perform any operation.
In this moment I have this code in the kernel
__global__ void cudaEvaluate(int size, int nb_msgs, int* dev_Results)
{
int tid=0,x=0,y=0, grid_width=0;
x = threadIdx.x + blockIdx.x * blockDim.x;
y = threadIdx.y + blockIdx.y * blockDim.y;
grid_width = gridDim.x * blockDim.x;
tid = y*grid_width+x;
if (tid > = size)
return;
dev_Results[tid] = 0;
for (int i=0; i < nb_msgs; i++)
{
dev_Results[tid] = tid; //This is only to make a test
}
}
Now, as the vector grows the time to process the kernel is greater.
How can I do to get a better performance when this array is every time greater