OpenMP GPU on Jetson AGX Xavier

I’m using gcc 7 on the agx and would like to use the #pragma target teams to offload computation and data to the xavier’s gpu. Only 1 team is created even if I set num_teams. I think num_teams sets an upper limit not how many teams I want.

My question is do I have to rebuild gcc with offloading to nvidia gpu’s and doing the whole nvptx-none flag before I can use the target construct? If so is there an easier way to build gcc with that capability?

Hi,

Not sure if this answer your question correctly.
Xavier only have one GPU. The the parameter limited by the available GPU number?

Thanks.

I don’t think the number of teams is limited by how many GPU devices there are. I could be wrong.

If the number of teams is limited by the number of devices does that mean I don’t need to add any additional flags to gcc to get the target region executed on the device?

Hi,

Sorry that I’m not familiar with OpenMP usage.

Would you mind to share me a simple reproducible sample.
And I can check this with our internal team and give you a much clear response?

Thanks.

Hello,

This is a simple matrix multiplication intended to computed on the GPU:

#pragma omp target data map (to: pA[0:NN],pB[0:NN]) map (tofrom: pC[0:N*N])
#pragma omp target
#pragma omp teams distribute parallel for collapse(2) private(i,j,k)
for(i=0;i<N;i++)
{
for(j=0;j<N;j++)
{
for(k=0;k<N;k++)
{
pC(i,j)+=pA(i,k)*pB(k,j);
}
}
}

Ask them if I need to do anything different to my gcc compiler in order for the above code to run on the gpu.

Also I had a question about speed comparison between the agx and an intel i7 3.8GHz pc. Should the speed difference be a lot because the agx is about 10 times slower in comparison.

Thanks for your assistance so far.

Hi,

Sorry for the late update.

Based on this document, it looks like the target value is limited to the number of devices.
Since Xavier only has one GPU, the value is set to 1.

Thanks.