Hi! I’m kinda newbie at CUDA,
I’m coding a parallel version of a heuristic graph coloring algorithm and the performance of my kernel is worrying me… the kernel execution time increases really a lot when i increase the number of threads in the kernel launch:
64 Threads -> 1.62s Kernel execution time
128 Threads -> 1.64s Kernel execution time
256 Threads -> 1.77s Kernel execution time
512 Threads -> 2.21s Kernel execution time
1024 Threads -> 8.00s Kernel execution time
With 2048+ Threads the kernel reaches it time limit and gives me a the launch timed out and was terminated error.
My Block and Grid organization works this way:
dim3 block,grid;
void threads_setup(aco_t* aco_info){
n_threads = aco_info->n_threads;
if( n_threads > 64){
block.x = 8;
block.y = 8;
int dim = n_threads / 64;
if (dim > 4){
int dim2 = dim / 2;
grid.x = dim2;
grid.y = 2;
}else{
grid.x = dim;
}
} else {
block.x = 8;
block.y = 8;
grid.x = 1;
}
}
Like i said i’m newbie so this thread_setup function was made like this just because i thought that 64 threads in a 2D block was a nice way of doing this, my kernel function doesn’t make any use of this since i run this code to get my threadID:
int threadID;
if(gridDim.y > 1){
threadID = ((blockIdx.x * (blockDim.x * blockDim.y)) + (blockIdx.y * (blockDim.x * blockDim.y * gridDim.x))) + ((threadIdx.x * blockDim.x) + threadIdx.y);
} else {
threadID = ((blockIdx.x * (blockDim.x * blockDim.y))) + ((threadIdx.x * blockDim.x) + threadIdx.y);
}
Other important information about my kernel is that it is made of 5 functions that are called a lot of times by a main function inside the kernel. So the times above are the sum of all this functions.
That’s all that i imagine that could be slowing my code so much, what is the real problem? The block and grid organization influences the performance this much? Having a lot of functions in the kernel is wrong? There’s a implicit sync barrier between functions calls inside the kernel? Or none of that is the reason and the problem is within my kernel?
Please if someone could point me to the right direction it would really help! Thanks for attention!