Error with pinned memory and threads on the host

Hahnfeld · August 9, 2017, 1:03pm

Hi,

please consider the following code which launches threads to perform some actions in parallel on multiple GPUs:

#include <cstdlib>
#include <memory>
#include <thread>

#include <openacc.h>

int main(int argc, char* argv[])
{
	int a[10];
	int numDevices = acc_get_num_devices(acc_get_device_type());
	std::unique_ptr<std::thread[]> threads(new std::thread[numDevices]);

	for (int i = 0; i < numDevices; i++) {
		threads[i] = std::thread([i] ( ) {
			acc_set_device_num(i, acc_get_device_type());
			#pragma acc data create(a[0:10])
			{ }
		});
	}

	for (int i = 0; i < numDevices; i++) {
		threads[i].join();
	}

	return EXIT_SUCCESS;
}

The code works fine with version 17.4 with

-std=c++11 -acc -ta=nvidia

. However, I’d like to use pinned memory in a real application.
So, I’m passing

-std=c++11 -acc -ta=nvidia:cc60,pinned

. Please first note that I have to specify

cc60

(or probably something similar) or else the executable will segfault. But even then the executable fails at runtime with

call to cuMemAlloc returned error 201: Invalid context

.

Is this code valid, i.e. am I allowed to use OpenACC directives inside a threaded environment? The threads are not accessing the same device.
Is there anything wrong with my compilation or is than an error in the compiler / runtime?

Regards,
Jonas

MatColgrove · August 9, 2017, 5:35pm

Hi Jonas,

Not sure if this is a PGI issue or a CUDA issue since I can replicate the error even if I remove all the OpenACC constructs. With “pinned”, we replace the memory allocation calls with calls to “cudaMallocHost”. It seems that the segv occurs when this memory is getting deallocated.

I’ve added a problem report (TPR#24636) and sent it on to engineering for investigation.

Thanks for the report!
Mat

Hahnfeld · August 10, 2017, 6:35am

Hi Mat,

thanks for your response and confirmation. cuMemAlloc should be responsible for allocation on the device, no?

But it really looks like it’s somehow related to C++ threads: Everything works fine if I use pthreads and OpenMP looks fine as well! I’ll use this as a workaround…
EDIT: No, the error just does not trigger on every execution. So no workaround for now…

Cheers,
Jonas

MatColgrove · August 10, 2017, 4:11pm

cuMemAlloc should be responsible for allocation on the device, no?

For the device data, yes. However “pinned” pins host data to physical memory via a call to “cudaMallocHost”.

-Mat

Topic		Replies	Views
"invalid context" when mixing OpenMP, OpenAcc Legacy PGI Compilers	2	3229	January 31, 2014
alloc of pinned memory has to be _after_ setting device Legacy PGI Compilers	3	5409	August 20, 2010
'invalid device ordinal' (cudaErrorInvalidDevice) CUDA Programming and Performance	6	5475	August 25, 2015
Portable pinned memory deallocation CUDA Programming and Performance	1	1249	January 26, 2010
Reporting a problem with CUDA memory access in multiple OS threads CUDA Programming and Performance	4	4895	April 30, 2007
OpenACC code with pinned memory Legacy PGI Compilers	4	4347	December 12, 2012
cudaMalloc and threads "invalid device pointer" error CUDA Programming and Performance	4	5444	June 26, 2007
OpenMP, OpenACC and acc_set_device_num Legacy PGI Compilers	12	10761	March 15, 2013
OpenACC + Pinned Memory + Segmenation fault Legacy PGI Compilers	1	3183	February 24, 2017
Pinned memory does not play nice with ctx management CUDA Programming and Performance	3	4606	November 7, 2008

Error with pinned memory and threads on the host

Related topics