Why does `default(none)` not always take effect in OpenMP?

I am currently using the latest version of the NVC compiler (22.3) for overlapping data transfers and computations on a node with multiple Nvidia A100 GPUs. My implementation relies on OpenMP target offloading. However, I once in a while experience that a data transfer gives the following error:

call to cuStreamSynchronize returned error 201: Invalid context

I am trying to run the data transfers in parallel by using the following construction:

#pragma omp parallel
{
	#pragma omp single nowait
	{
		#pragma omp taskgroup
		for (int i=0;i<num_targets;i++){
			#pragma omp task
			// create tasks for host to device transfers
		}
	}
}

I think the issue is related to data scoping, and I have noticed that the data scoping clauses do not seem to have any effect at all. For instance, if I write

#pragma omp parallel default(none)
{
	#pragma omp single nowait
	{
		#pragma omp taskgroup
		for (int i=0;i<num_targets;i++){
			#pragma omp task default(none)
			// create tasks for host to device transfers here
		}
	}
}

I do not get any errors or warnings, but if I add default(none) after a #pragma omp parallel for or #pragma omp target teams distribute clause I get appropriate errors:

"some_file.c", line some_line: error: item must appear in a SHARED or PRIVATE clause: some_variable_name

Is it a bug in the compiler, or am I doing something wrong?

Hi AntonRydahl,

Do you have a reproducing example that you can share?

The “invalid context” error may be a compiler issue, but having an example code will help determine the issue.

For the “default(none)” issue, I’ll need to consult with my OpenMP team on what the expected behavior should be.

Thanks,
Mat

Hi Mat

I am sorry for the late reply.

I have tried to make a downscaled version of the code that I am currently working on. In transfer_to_device_fail you can find the non-working version of the data transfer. I have written a few comments indicating where removing some of the variables from the data scoping clauses appears to have no effect.

On the test that fails I use two Nvidia A100 and run the code with OMP_NUM_THREADS=4 or more.

nvidia_bug_report.zip (5.5 KB)

Thanks Anton.

For me, the code only fails intermittently (about 10% of the time), but I am able to reproduce the error.

I haven’t tried using tasks it interleave data movement myself, so have sent your code to some folks on our OpenMP team to help. It may be a day or so before I hear back, but will let you know what they say.

-Mat

Hi Mat

Thank you for the great and fast service! It also only occasionally fails when I run the code. I look forward to hearing what is going on.

Kind regards, Anton

Engineering took a look and confirmed that it is a compiler issue which we’ve filed as TPR #31805.

Thanks for the report!
Mat

Hi Anton,

TPR #31805 was fixed in the 22.5 release.

-Mat

1 Like

Great work! Thank you so much for fixing this issue so fast! :D

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.