I am currently using the latest version of the NVC compiler (22.3) for overlapping data transfers and computations on a node with multiple Nvidia A100 GPUs. My implementation relies on OpenMP target offloading. However, I once in a while experience that a data transfer gives the following error:
call to cuStreamSynchronize returned error 201: Invalid context
I am trying to run the data transfers in parallel by using the following construction:
#pragma omp parallel
{
#pragma omp single nowait
{
#pragma omp taskgroup
for (int i=0;i<num_targets;i++){
#pragma omp task
// create tasks for host to device transfers
}
}
}
I think the issue is related to data scoping, and I have noticed that the data scoping clauses do not seem to have any effect at all. For instance, if I write
#pragma omp parallel default(none)
{
#pragma omp single nowait
{
#pragma omp taskgroup
for (int i=0;i<num_targets;i++){
#pragma omp task default(none)
// create tasks for host to device transfers here
}
}
}
I do not get any errors or warnings, but if I add default(none) after a #pragma omp parallel for or #pragma omp target teams distribute clause I get appropriate errors:
"some_file.c", line some_line: error: item must appear in a SHARED or PRIVATE clause: some_variable_name
Is it a bug in the compiler, or am I doing something wrong?
I have tried to make a downscaled version of the code that I am currently working on. In transfer_to_device_fail you can find the non-working version of the data transfer. I have written a few comments indicating where removing some of the variables from the data scoping clauses appears to have no effect.
On the test that fails I use two Nvidia A100 and run the code with OMP_NUM_THREADS=4 or more.
For me, the code only fails intermittently (about 10% of the time), but I am able to reproduce the error.
I haven’t tried using tasks it interleave data movement myself, so have sent your code to some folks on our OpenMP team to help. It may be a day or so before I hear back, but will let you know what they say.