Hi,
please consider the following code which launches threads to perform some actions in parallel on multiple GPUs:
#include <cstdlib>
#include <memory>
#include <thread>
#include <openacc.h>
int main(int argc, char* argv[])
{
int a[10];
int numDevices = acc_get_num_devices(acc_get_device_type());
std::unique_ptr<std::thread[]> threads(new std::thread[numDevices]);
for (int i = 0; i < numDevices; i++) {
threads[i] = std::thread([i] ( ) {
acc_set_device_num(i, acc_get_device_type());
#pragma acc data create(a[0:10])
{ }
});
}
for (int i = 0; i < numDevices; i++) {
threads[i].join();
}
return EXIT_SUCCESS;
}
The code works fine with version 17.4 with
-std=c++11 -acc -ta=nvidia
. However, I’d like to use pinned memory in a real application.
So, I’m passing
-std=c++11 -acc -ta=nvidia:cc60,pinned
. Please first note that I have to specify
cc60
(or probably something similar) or else the executable will segfault. But even then the executable fails at runtime with
call to cuMemAlloc returned error 201: Invalid context
.
- Is this code valid, i.e. am I allowed to use OpenACC directives inside a threaded environment? The threads are not accessing the same device.
- Is there anything wrong with my compilation or is than an error in the compiler / runtime?
Regards,
Jonas