is it possible to use printf() in openacc parallel construct

Hallo. It is written in the OPENACC GETTING STARTED GUIDE in the section “Fortran I/O” that:
“Starting in PGI 15.5, print and write statements in device code are also supported when
used with the LLVM code generator -ta=llvm, or -Mcuda=llvm, and in combination with
the -mp compiler option.”

My question is: Is it possible to use a C function printf() inside
#pragma acc parallel{…} construct?

If no, is there any opportunity to do prints inside the parallel construct (to make prints on gpu) in C++?

Hi @and,

No, sorry. A Printf call is not available in C/C++. Also note that Fortran only supports unformatted writes within OpenACC.

-Mat

It’s a pity. As i know, printf() and cuPrintf() may be used in CUDA. It is the OpenAcc standard that does not allow to print in device code, isn’t it?
Excuse me, but i have found the information here:
https://parallel-computing.pro/index.php/11-openacc/53-using-cuda-device-functions-from-openacc.
Is it wrong?

Couldn’t You also answer another question. There is a loop in my code, which is parallelized in the following simple way:
#pragma acc parallel loop present(particles)
for(int i=0; i<LIFE; ++i)
{…
50-80 sequential code lines, 2 or 3 small if constructions
}
In this loop i change the elements (from 0 to LIFE) of the array particles (on gpu). There are no dependencies between the loop iterations.
My question is: Is this way of parallelizing the loop the most efficient (i hesitate that i don’t specify the gang() and vector_length() clauses (and, may be, the worker() clause)?
Thank You.

It is the OpenAcc standard that does not allow to print in device code, isn’t it?

The OpenACC standard doesn’t define this. More it’s a limitation of PGI’s implementation.

Is it wrong?

I haven’t tried this particular code, but I assume that it works. The key difference is that you’d be calling printf from CUDA, not OpenACC.

My question is: Is this way of parallelizing the loop the most efficient (i hesitate that i don’t specify the gang() and vector_length() clauses (and, may be, the worker() clause)?

I’d just let the compiler schedule the loop. It typically does a good job, especially for a single loop level.

Of course, feel free to try various other schedules, but my most likely the compiler will automatically find the optimal schedule.

-Mat