is it possible to use printf() in openacc parallel construct

_and · December 13, 2017, 4:28pm

Hallo. It is written in the OPENACC GETTING STARTED GUIDE in the section “Fortran I/O” that:
“Starting in PGI 15.5, print and write statements in device code are also supported when
used with the LLVM code generator -ta=llvm, or -Mcuda=llvm, and in combination with
the -mp compiler option.”
My question is: Is it possible to use a C function printf() inside
#pragma acc parallel{…} construct?
If no, is there any opportunity to do prints inside the parallel construct (to make prints on gpu) in C++?

MatColgrove · December 13, 2017, 4:58pm

Hi @and,

No, sorry. A Printf call is not available in C/C++. Also note that Fortran only supports unformatted writes within OpenACC.

-Mat

_and · December 14, 2017, 3:21pm

It’s a pity. As i know, printf() and cuPrintf() may be used in CUDA. It is the OpenAcc standard that does not allow to print in device code, isn’t it?
Excuse me, but i have found the information here:
https://parallel-computing.pro/index.php/11-openacc/53-using-cuda-device-functions-from-openacc.
Is it wrong?

_and · December 14, 2017, 3:37pm

Couldn’t You also answer another question. There is a loop in my code, which is parallelized in the following simple way:
#pragma acc parallel loop present(particles)
for(int i=0; i<LIFE; ++i)
{…
50-80 sequential code lines, 2 or 3 small if constructions
}
In this loop i change the elements (from 0 to LIFE) of the array particles (on gpu). There are no dependencies between the loop iterations.
My question is: Is this way of parallelizing the loop the most efficient (i hesitate that i don’t specify the gang() and vector_length() clauses (and, may be, the worker() clause)?
Thank You.

MatColgrove · December 14, 2017, 4:32pm

It is the OpenAcc standard that does not allow to print in device code, isn’t it?

The OpenACC standard doesn’t define this. More it’s a limitation of PGI’s implementation.

Is it wrong?

I haven’t tried this particular code, but I assume that it works. The key difference is that you’d be calling printf from CUDA, not OpenACC.

My question is: Is this way of parallelizing the loop the most efficient (i hesitate that i don’t specify the gang() and vector_length() clauses (and, may be, the worker() clause)?

I’d just let the compiler schedule the loop. It typically does a good job, especially for a single loop level.

Of course, feel free to try various other schedules, but my most likely the compiler will automatically find the optimal schedule.

-Mat

Topic		Replies	Views
Calling CUDA-library functions in OpenACC parallel region Legacy PGI Compilers	4	6422	October 26, 2018
Can I use 'printf' or something in OpenACC kernel region? Legacy PGI Compilers	1	3456	August 12, 2015
device printf not working cuda 3.1 printf not working on tesla c2050 CUDA Programming and Performance	16	63310	May 27, 2011
Parallelization of c++ code with OpenACC in PGI 20.7 Legacy PGI Compilers	4	655	August 19, 2020
how to compil CUDA device functions Legacy PGI Compilers	10	5060	August 29, 2018
How to not parallelize inner loops in OpenACC ? Legacy PGI Compilers	7	3758	May 1, 2020
OPENACC changes value of array Legacy PGI Compilers	12	9726	May 17, 2016
auto converter Serial C Code to Parallel GPU Code CUDA Programming and Performance	1	825	May 11, 2016
A question about print something on screen Legacy PGI Compilers	4	1809	July 30, 2018
printf statements from cuda's __global__ and __device__ functions CUDA Programming and Performance	3	14921	June 8, 2009

is it possible to use printf() in openacc parallel construct

Related topics