Serial execution on GPU

I wish to execute a very simple kernel on the GPU, basically consisting of an if-else statement. The reason why I want it executed on the GPU is that the if-else statement evaluates the value of an array in device memory, which I don’t want to bring back to the host just for the if-else statement evaluation.

What is the best way to launch a kernel for sequential execution on the GPU with OpenACC? My current method is:

#pragma acc parallel present(ARRAY[0:length]) num_gangs(1) vector_length(1)
    if (ARRAY[0] == somevalue){
    } else{


Yes, this should work. Are the “do_something” routines? If so, then be sure to use the OpenACC routine pragma or inline the routines.

  • Mat

Suppose I have a big program and I use multiple “#pragma acc parallel” directives in different parts of the code. If I want to run the program sequentially in the GPU, do I have to add “num_gangs(1) num_workers(1) vector_length(1)” to each parallel directive or is there a quicker way to run the program sequentially, like a compiler flag for example?

The OpenACC 2.6 standard which was just ratified a few months ago added a new “acc serial” directive which will offload the section to the GPU and run it sequentially. We’ll have support for “serial” in the up coming 18.1 release.

Though for now, you’ll want to add the “num_gangs(1) vector_length(1)” to force the loop to run sequentially. “num_workers(1)” is the default so doesn’t need to be added.