Slow execution

Hello Forum

I have a question about the performance of an application:

I have a simulator developed in cuda/c++.
In this code I simulate the number of errors that can support a network.

I sending to the gpu a grid with 2666496 threads.

The kernel have more or less this structure

__global__ void kernel(....)
{
   int success_msg=0;

   for (int i = 0; i < 16; i++)
   {
       for(int j=1; j < 3; j++)
       {
         ...
       }

       for (int j = 3; j >= 0; j--)
       {
         ...
       }

       if (msgStatus == Dest_Reached)
       {
    	   success_msg = 1;
           break;
       }
   }

}

If a run this code, the execution take 12 hours, but if I remove the last if or at least the content of it (the success and the break), the simulation finish in 3,41 minutes.

Why is this sentence hindering the code.
How I can solve this problem.

Thank you

What is the Dest_Reached? Is it important for the code? do you use some threadfence function to block the execution at some point?

Hello

Dest_Reached is a enum type that I have in a .h file. and msgStatus is a var of this type.

enum MessageStatus
{
	In_Progress,
	Dest_Unreacheable,
	Dest_Reached,
	Wrong_Dest,
};

The problem is not with the if, but if with the content of it.

If I write the code so

if (msgStatus == Dest_Reached)
{
    //here, empty
}
  • The execution is ok, but if I put some var or a ‘break’ into the ‘if’ I get the problem again.

  • Im no blocking the code with a function

Thank you for your prompt response

When you put nothing in the if the compiler detects it and it removes it. Are these message status variables declared globally?

The var "msgStatus " is declared at beginning of the function

__global__ void kernel(....)
{
     int success_msg=0;
     MessageStatus msgStatus = In_Progress;
...

And if You do it this way (instead of the IF statement):

success_msg = (msgStatus == Dest_Reached);
i = i + 16 * success_msg;

MK

Hi Guys

cmaster.matso: I have tested your code, but the problem persist.

I have been checking the code and I dont why but, if I declare a var into the function and later I use this var into an ‘if’, the execution of the app is hindering.

There is some bug or something that prevents the normal execution of the app?

This app is running over 4 gpu’s at time

Then maybe every time You set msgStatus == Dest_Reached set the success_msg to 1 along side, otherwise to 0. Additionally change the outer loop to include the success_msg, likewise (provided it suits what the kernel needs to do):

...
for (int i = 0; i < 16 && !success_msg; ++i)
...

MK

Hello

I have been looking the code, and I don’t if the problem is because I use into the kernel function objects (classes)

Can I use a class as parameter to the kernel, or always I must to pass a struct?

Thanks