I have a question about the performance of an application:
I have a simulator developed in cuda/c++.
In this code I simulate the number of errors that can support a network.
I sending to the gpu a grid with 2666496 threads.
The kernel have more or less this structure
__global__ void kernel(....)
{
int success_msg=0;
for (int i = 0; i < 16; i++)
{
for(int j=1; j < 3; j++)
{
...
}
for (int j = 3; j >= 0; j--)
{
...
}
if (msgStatus == Dest_Reached)
{
success_msg = 1;
break;
}
}
}
If a run this code, the execution take 12 hours, but if I remove the last if or at least the content of it (the success and the break), the simulation finish in 3,41 minutes.
Why is this sentence hindering the code.
How I can solve this problem.
cmaster.matso: I have tested your code, but the problem persist.
I have been checking the code and I dont why but, if I declare a var into the function and later I use this var into an ‘if’, the execution of the app is hindering.
There is some bug or something that prevents the normal execution of the app?
Then maybe every time You set msgStatus == Dest_Reached set the success_msg to 1 along side, otherwise to 0. Additionally change the outer loop to include the success_msg, likewise (provided it suits what the kernel needs to do):
...
for (int i = 0; i < 16 && !success_msg; ++i)
...