OpenACC on NVIDIA OPTIMUS card

ksimon · January 7, 2014, 1:40pm

Hi,

I’m new in PGI and OpenACC.
In my laptop there is a NVIDIA NVS 5400M GPU.
I’ve install PGI computer on Ubuntu 12.04.3.

Under Linux Optimus based cards can use by Bumblebee (http://bumblebee-project.org/)

pgaccelinfo says without using Bumblebee:

$ ./pgaccelinfo 
CUDA Driver Version:           5050
FATAL: Module nvidia not found.
No accelerators found.
Try ./pgaccelinfo -v for more information

with Bumblebee:

$ optirun ./pgaccelinfo
CUDA Driver Version:           5050
NVRM version:                  NVIDIA UNIX x86_64 Kernel Module  319.37  Wed Jul  3 17:08:50 PDT 2013
Device Number:                 0
Device Name:                   NVS 5400M
Device Revision Number:        2.1
Global Memory Size:            1073414144
Number of Multiprocessors:     2
Number of Cores:               64
Concurrent Copy and Execution: Yes
Total Constant Memory:         65536
Total Shared Memory per Block: 49152
Registers per Block:           32768
Warp Size:                     32
Maximum Threads per Block:     1024
Maximum Block Dimensions:      1024, 1024, 64
Maximum Grid Dimensions:       65535 x 65535 x 65535
Maximum Memory Pitch:          2147483647B
Texture Alignment:             512B
Clock Rate:                    950 MHz
Execution Timeout:             Yes
Integrated Device:             No
Can Map Host Memory:           Yes
Compute Mode:                  default
Concurrent Kernels:            Yes
ECC Enabled:                   No
Memory Clock Rate:             900 MHz
Memory Bus Width:              128 bits
L2 Cache Size:                 131072 bytes
Max Threads Per SMP:           1536
Async Engines:                 1
Unified Addressing:            No
Initialization time:           1149 microseconds
Current free memory:           1052237824
Upload time (4MB):              857 microseconds ( 691 ms pinned)
Download time:                  964 microseconds ( 674 ms pinned)
Upload bandwidth:              4894 MB/sec (6069 MB/sec pinned)
Download bandwidth:            4350 MB/sec (6223 MB/sec pinned)
PGI Compiler Option:           -ta=nvidia,cc20

So pgaccecelinfo finds my GPU with Bumblebee.

I wanted to try some example from /opt/pgi/etc/samples/openacc. I can compile them, but I have troube with running.

Run acc_c1.exe says without optirun:

$ ./acc_c1.exe 
FATAL: Module nvidia not found.
100000 iterations completed

Of course it is not problem, but running with optirun there is no answer! (should print “100000 iterations completed”)

Have anybody some experience with Optimus / Bumblebee?

Thanks in advance for any help anybody could provide!

Best regards,
Simi

MatColgrove · January 7, 2014, 4:14pm

Hi Simi,

Try setting the environment variable “PGI_ACC_DEBUG=1”. This should tell us if anything is getting run on the device.

Mat

ksimon · January 8, 2014, 8:01am

Hi Mat,

I’ve tried with debug option:

$ PGI_ACC_DEBUG=1 optirun ./acc_c1.exe
ACC: detected 1 CUDA devices
ACC: device[1] is NVIDIA CUDA device 0
ACC: initialized 1 CUDA devices
ACC: device[2] is PGI native
ACC: device[0] is PGI native
pinitialize for thread 1
argument memory for queue 8 device:0x500100000 host:0x200000000
curr_devid for thread 1 is 1
pgi_uacc_begin( compute region, file=/opt/pgi/linux86-64/13.10/etc/samples/openacc/acc_c1.c, function=main, lines=17:44, startline=35, endline=38, devid=0, threadid=1 )
pgi_uacc_begin( file=/opt/pgi/linux86-64/13.10/etc/samples/openacc/acc_c1.c, function=main, lines=17:44, startline=35, endline=38, devid=1, threadid=1 ) dindex=1
pgi_uacc_enter( devid=1 )

I don’t know what is the normal behaviour but it seems it doesn’t return from GPU execution.

Thank You for your help!

Best regards,
Simi

MatColgrove · January 8, 2014, 10:34pm

Hi Simi,

The next line in the output should read “Thread 1 loading module onto device 0”, so the exe is dying when trying load the kernel module onto the device. Why, I’m not sure.

Device query programs pgaccelinfo don’t necessary run on the device, rather just query the driver. So the first thing I’d like you to try is running a CUDA C code that does some compute. Nothing big, but something from the CUDA C SDK should suffice.

Mat

ksimon · January 9, 2014, 8:24am

Hi Mat,

Previously I’ve installed CUDA separately, not with PGI compiler.
So during PGI compiler installation I’ve choosen ‘CUDA install: NO’ option.
Previously installed CUDA location: /usr/local/cuda-5.5/
I don’t know whether it is important or not…

CUDA and OpenCL examples work fine with optirun (bumblebee).

Thank You for your help!

Best regards,
Simi

PS: Here is the result of CUDA/OpenCL.
CUDA example:

$ optirun ./convolutionSeparable 
[./convolutionSeparable] - Starting...
GPU Device 0: "NVS 5400M" with compute capability 2.1

Image Width x Height = 3072 x 3072

Allocating and initializing host arrays...
Allocating and initializing CUDA arrays...
Running GPU convolution (16 identical iterations)...

convolutionSeparable, Throughput = 599.9124 MPixels/sec, Time = 0.01573 s, Size = 9437184 Pixels, NumDevsUsed = 1, Workgroup = 0

Reading back GPU results...

Checking the results...
 ...running convolutionRowCPU()
 ...running convolutionColumnCPU()
 ...comparing the results
 ...Relative L2 norm: 0.000000E+00

Shutting down...
Test passed

OpenCL example:

$ optirun ./oclConvolutionSeparable 
[oclConvolutionSeparable] starting...

clGetPlatformID...
Get the Device info and select Device...
  # of Devices Available = 1
  Using Device 0: NVS 5400M
  # of Compute Units = 2
./oclConvolutionSeparable Starting...

Allocating and initializing host memory...
Initializing OpenCL...
Initializing OpenCL separable convolution...
Loading ConvolutionSeparable.cl...
Creating convolutionSeparable program...
Building convolutionSeparable program...
Creating OpenCL memory objects...
Applying separable convolution to 3072 x 3072 image...

Reading back OpenCL results...

Comparing against Host/C++ computation...
Relative L2 norm: 0.000e+00

[oclConvolutionSeparable] test results...
PASSED

ksimon · January 9, 2014, 12:40pm

Hi Mat,

I’ve found the solution!

For compilation I didn’t use -Mcuda option and I got this message:

pgcc-Error-CUDA version 5.0 is not available in this installation

I thought previously installed CUDA will be used by PGI compiler.
So I’ve installed PGI compiler again with CUDA installation and the examples work fine!

Thank You for your help!

Best regards,
Simi

MatColgrove · January 9, 2014, 3:34pm

Hi Simi,

You shouldn’t need the -Mcuda flag, but you do need to install the CUDA version that accompanies the PGI compilers.

Mat

ksimon · January 13, 2014, 8:02am

Hi Mat,

Yes, You’re right!
I didn’t know -Mcuda flag is not necessary but this and your help led to solution!

Thank You again!

Best regards,
Simi