OpenACC "streams" on multicore

Hi,
I am working on an OpenACC code where I am using multiple streams to overlap computation and communications, when running on GPUs.

When compiling the code for a multicore target and running it on a CPU how the different streams are handled? Async clauses get simply ignored and every parallel region get executed as a blocking call, or some sort of concurrency is attempted, if there are enough cores/threads?

Thanks and Best Regards,

Enrico

Hi Enrico,

Currently OpenACC async clauses are ignored when targeting multicore device. However, we are exploring how effective it would be to enable this. For most cases, async is used to overlap data movement and compute which doesn’t matter for multicore since there’s no data movement. But we are looking for other cases to see if async could help.

-Mat

Hi Mat,

In our case it would be beneficial in order to overlap computations with MPI data communications between different computing nodes.

In general it would be nice to be able to run concurrently multiple “kernels” which are unable to exploit all the available cores. It cold be interesting in particular for newer CPUs, with a lot of cores.


Enrico

We are exploring this, but without a compelling use case these items tend to be pushed to a lower priority. Are you able to share your code? That may help in getting it on next year’s roadmap.

-Mat