Hi, I’ve been trying to use the PGI accelerated compiler version 11.10 to parallelize a code across both the CPU and GPU. Best case scenario I was hoping to have something like the following.
#pragma omp parallel
{
int tid = omp_get_thread_num();
printf("id:%d\n", tid);
if(tid == 0){
acc_set_device_num(0, acc_device_nvidia);
#pragma acc region for
...
}else if(tid == 1){
acc_set_device_num(1, acc_device_nvidia);
#pragma acc region for
...
}else{
...
}
}
where … is some code to accelerate. I tried that first, and ran into segfaults. Now I’m down to trying anything I can think of, but every time I try and have CPU parallel and GPU parallel regions in the same code, I get a segfault in the CPU region. GDB gives me something like the following.
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x2aaab14d6940 (LWP 6785)]
0x00000000004106df in _mp_penter64 ()
(gdb) bt
#0 0x00000000004106df in _mp_penter64 ()
I’ve tried everything I can think of to get these to work together, up to and including using pthreads to create the outer threads and run the cpu and gpu regions in separate pthreads, and always get the same result. Once in a very long while, the code will run without segfault, but when that happens it hangs. The most basic version of what I’ve been trying to do is this.
#define SIZE 100
int main(int argc, char * argv[])
{
int stuff[SIZE];
int limit = omp_get_thread_limit();
printf("limit:%d\n", limit);
#pragma omp parallel shared(stuff)
{
int tid = omp_get_thread_num();
int i;
if(tid == 0){
#pragma acc region for copy(stuff)
for(i = 0; i<SIZE; i++)
{
stuff[i] = 1;
}
}
printf("thread_id:%d\n", tid);
}
return 0;
}
That said, even this fails.
#define SIZE 100
int main(int argc, char * argv[])
{
int stuff[SIZE];
int limit = omp_get_thread_limit();
printf("limit:%d\n", limit);
#pragma omp parallel shared(stuff)
{
int tid = omp_get_thread_num();
printf("thread_id:%d\n", tid);
}
int i;
#pragma acc region for copy(stuff)
for(i = 0; i<SIZE; i++)
{
stuff[i] = 1;
}
return 0;
}
Which seems to be the same bug as in PGI ACC release 11.0: Multiple GPUs using openmp since it works fine if I remove either one, but not at all if both stay,
but there is no resolution there. Any ideas what might be going wrong?
Platform details:
2 c2050 GPUs
2 6-core intel CPUs
chaos linux 2.6.18-107
PGI accelerator 11.10