- I run my code using this command:
pgc++ -fast -acc -ta=tesla -Minfo=accel -o output code.cpp
I used a simple code :
#pragma acc parallel loop
for (int i = 1; i < 100; i++)
{
a[i] = 10;
y[i] = a[i-1];
}
I got this result:
main:
11, Generating Tesla code
15, #pragma acc loop gang, vector(99) /* blockIdx.x threadIdx.x */
11, Generating implicit copyin(a[:99]) [if not already present]
Generating implicit copyout(y[1:99]) [if not already present]
Generating implicit allocate(a[:100]) [if not already present]
Generating implicit copyout(a[1:99]) [if not already present]
- I want to know is it parallelized successfully, and how do I know from the result in the terminal?
- I am using (accel) is there another flag that showed more detailed?
- is there any flag that I can use to show me the execution time? Because I want to compare and see the difference between running the code serially and parallelly.
- in this statement: y[i] = a[i-1]; is there a loop carried dependency or not?
- I run another code using this command:
#pragma acc kernels
for (int i = 1; i < 100; i++)
{
x[i] = x[i-1];
}
and the result was as follow:
main:
11, Generating implicit allocate(x[:100]) [if not already present]
Generating implicit copyin(x[:99]) [if not already present]
Generating implicit copyout(x[1:99]) [if not already present]
15, Loop carried dependence of x prevents parallelization
Loop carried backward dependence of x prevents vectorization
Accelerator serial kernel generated
Generating Tesla code
15, #pragma acc loop seq
- is the code parallelized successfully or not.
- I read some documents that mention that in case of using :
#pragma acc loop seq the code can’t work in parallel, but I didn’t use it. is the result here that showed
#pragma acc loop seq a suggestion from OpenACC or kernels already applied this directive on the code.
Thanks in advance.