Hi.
I’m programming a CUDA kernel to compute something related to computational geometry from a set of points in the plane. The program divides the plane into a two-dimensional grid (as a sort of window with pixels). For each of these “pixels” and inside the CUDA kernel we have to do a double loop over all the points in our initial set of points, so the algorithm has a n² complexity where n is the number of points.
The CUDA kernel looks something like that:
__global__ void CUDA_kernel(float* points_list, int number_of_points, float* result)
{
for(int i=0;i<number_of_points;i++)
{
for(int j=i+1;j<number_of_points;j++)
{
(do something);
}
}
result[index]=something;
}
The problem comes (I only suppose) when n grows and this double loop cause the total amount of instructions inside the CUDA kernel exceeds a determined number (I can see in the reference manual that a CUDA kernel can have, at most, 2 million of native instructions). The PC freezes completely…I can still move the mouse but I can do nothing except reset the machine.
I was thinking that the problem is the number of instructions, so I divide the CUDA kernel in more little loops, something like this
__global__ void CUDA_kernel(float* points_list, int number_of_points, float* result,int partition,int elements_in_partition)
{
int begin=partition*elements_in_partition;
for(int i=begin;i<begin+elements_in_partition;i++)
{
for(int j=i+1;j<number_of_points;j++)
{
(do something);
}
}
result[index]=something;
}
.....
elements_in_partition=something;
for(int partition=0;partition<number_of_partitions;partition++)
{
CUDA_kernel<<<block_dim,grid_dim>>>(...,partition,elements_in_partition);
}
....
But, surprisingly, PC still freezes. Do you think that the problem is the number of instructions inside CUDA kernel? Is this could be the problem, why with an inferior number of points (n) the program runs well (even with a greater number of instructions than in the second version posted here)?
Can be anything else? Have you any idea? :).
Thanks in advance.