PGI ACC release 11.0: Multiple GPUs using openmp

pengc · January 3, 2011, 11:29pm

i have here a very simple test program for my workstation with two GPU cards:

#include <stdio.h>
#include<stdlib.h>
#include <accel.h>

#define N  128

int main(int argc, char* argv) {

	int i; 
	
	double *restrict a;
	a=(double *restrict)malloc(N*sizeof(double));
	omp_set_num_threads(2);
	
#pragma omp parallel
{ 
	int th_id;
		th_id = omp_get_thread_num();
		printf("Hello World from thread %d\n", th_id);
}
#pragma acc region
{ 

	for (i = 0; i < N; i++) {
			a[i]*=2.0;
	} 
} 
return 0; 
}

it was than compiled using the command:

pgcc test.c -o test -mp -fast -Minfo -ta=nvidia,time

compiler(pgi acc compiler release 11.0) output without errors or warnings:

main:
     16, Parallel region activated
     21, Parallel region terminated
         Generating copy(a[0:127])
         Generating compute capability 1.3 binary
         Generating compute capability 2.0 binary
     24, Loop is parallelizable
         Accelerator kernel generated
         24, #pragma acc for parallel, vector(128) /* blockIdx.x threadIdx.x */
             CC 1.3 : 4 registers; 28 shared, 4 constant, 0 local memory bytes; 100% occupancy
             CC 2.0 : 8 registers; 4 shared, 40 constant, 0 local memory bytes; 66% occupancy

when i run the binary => segmentation fault

could anyone tell me what might be the problem.

MatColgrove · January 4, 2011, 12:59am

Hi pengc,

I tried your code and it worked fine for me. (See below). Does it still fail if you remove the OpenMP code? Does it fail if you remove the accelerator region? What is the output of the command ‘pgaccelinfo’?

Mat

% pgcc test.c -o test1.out -mp -fast -Minfo -ta=nvidia,time -V11.0 ; test1.out
main:
     16, Parallel region activated
     21, Parallel region terminated
         Generating copy(a[0:127])
         Generating compute capability 1.3 binary
         Generating compute capability 2.0 binary
     24, Loop is parallelizable
         Accelerator kernel generated
         24, #pragma acc for parallel, vector(128) /* blockIdx.x threadIdx.x */
             CC 1.3 : 4 registers; 28 shared, 4 constant, 0 local memory bytes; 100% occupancy
             CC 2.0 : 8 registers; 4 shared, 40 constant, 0 local memory bytes; 66% occupancy
Hello World from thread 0
Hello World from thread 1

Accelerator Kernel Timing data
/tmp/qa/test.c
  main
    21: region entered 1 time
        time(us): total=139074 init=138626 region=448
                  kernels=22 data=49
        w/o init: total=448 max=448 min=448 avg=448
        24: kernel launched 1 times
            grid: [1]  block: [128]
            time(us): total=22 max=22 min=22 avg=22

pengc · January 4, 2011, 7:21am

Thanks Mat! but it still fail
output of pgaccelinfo:

-bash-3.2$ pgaccelinfo
CUDA Driver Version:           3020

Device Number:                 0
Device Name:                   GeForce GTX 480
Device Revision Number:        2.0
Global Memory Size:            1610285056
Number of Multiprocessors:     15
Number of Cores:               480
Concurrent Copy and Execution: Yes
Total Constant Memory:         65536
Total Shared Memory per Block: 49152
Registers per Block:           32768
Warp Size:                     32
Maximum Threads per Block:     1024
Maximum Block Dimensions:      1024, 1024, 64
Maximum Grid Dimensions:       65535 x 65535 x 1
Maximum Memory Pitch:          2147483647B
Texture Alignment:             512B
Clock Rate:                    1401 MHz
Initialization time:           1171282 microseconds
Current free memory:           1501691904
Upload time (4MB):             1030 microseconds ( 722 ms pinned)
Download time:                 1418 microseconds (1176 ms pinned)
Upload bandwidth:              4072 MB/sec (5809 MB/sec pinned)
Download bandwidth:            2957 MB/sec (3566 MB/sec pinned)

Device Number:                 1
Device Name:                   GeForce GTX 285
Device Revision Number:        1.3
Global Memory Size:            1073545216
Number of Multiprocessors:     30
Number of Cores:               240
Concurrent Copy and Execution: Yes
Total Constant Memory:         65536
Total Shared Memory per Block: 16384
Registers per Block:           16384
Warp Size:                     32
Maximum Threads per Block:     512
Maximum Block Dimensions:      512, 512, 64
Maximum Grid Dimensions:       65535 x 65535 x 1
Maximum Memory Pitch:          2147483647B
Texture Alignment:             256B
Clock Rate:                    1476 MHz
Initialization time:           1171282 microseconds
Current free memory:           1032916736
Upload time (4MB):             1147 microseconds ( 799 ms pinned)
Download time:                 2230 microseconds (2028 ms pinned)
Upload bandwidth:              3656 MB/sec (5249 MB/sec pinned)
Download bandwidth:            1880 MB/sec (2068 MB/sec pinned)
CAL version:                   1.4-553
No attached CAL devices

if i remove the openmp region:

main
    16: region entered 1 time
        time(us): total=2082729 init=2082413 region=316
                  kernels=42 data=26
        w/o init: total=316 max=316 min=316 avg=316
        19: kernel launched 1 times
            grid: [1]  block: [128]
            time(us): total=42 max=42 min=42 avg=42

if i remove the acc region:

pgcc test.c -o test1.out -mp -fast -Minfo -ta=nvidia,time -V11.0 ; ./test1.out
main:
     16, Parallel region activated
     22, Parallel region terminated
Hello World from thread 1
Hello World from thread 0

it still fail if have both regions. Should there be some conflict?

Topic		Replies	Views
Using multiple GPUs Legacy PGI Compilers	7	22160	August 11, 2009
CPU parallel and accelerator regions in the same program Legacy PGI Compilers	13	9006	July 18, 2012
PGI accelerator model with OpenMP/MPI Legacy PGI Compilers	7	9726	September 9, 2010
combine the OpenMP with the OpenACC Legacy PGI Compilers	5	5497	April 22, 2014
simple multi-gpu test program not working Legacy PGI Compilers	4	4132	June 14, 2013
About two or more GPUs Legacy PGI Compilers	6	7207	July 31, 2012
OpenMP, OpenACC and acc_set_device_num Legacy PGI Compilers	12	10902	March 15, 2013
PGI 13.1 breaks on acc regions inside parallel regions Legacy PGI Compilers	2	4308	November 15, 2013
PGI complier with OMP option Legacy PGI Compilers	3	5901	June 15, 2010
how to compile a !$acc program? Legacy PGI Compilers	3	12963	July 14, 2009

PGI ACC release 11.0: Multiple GPUs using openmp

Related topics