The next line in the output should read “Thread 1 loading module onto device 0”, so the exe is dying when trying load the kernel module onto the device. Why, I’m not sure.
Device query programs pgaccelinfo don’t necessary run on the device, rather just query the driver. So the first thing I’d like you to try is running a CUDA C code that does some compute. Nothing big, but something from the CUDA C SDK should suffice.
Previously I’ve installed CUDA separately, not with PGI compiler.
So during PGI compiler installation I’ve choosen ‘CUDA install: NO’ option.
Previously installed CUDA location: /usr/local/cuda-5.5/
I don’t know whether it is important or not…
CUDA and OpenCL examples work fine with optirun (bumblebee).
Thank You for your help!
Best regards,
Simi
PS: Here is the result of CUDA/OpenCL.
CUDA example:
$ optirun ./convolutionSeparable
[./convolutionSeparable] - Starting...
GPU Device 0: "NVS 5400M" with compute capability 2.1
Image Width x Height = 3072 x 3072
Allocating and initializing host arrays...
Allocating and initializing CUDA arrays...
Running GPU convolution (16 identical iterations)...
convolutionSeparable, Throughput = 599.9124 MPixels/sec, Time = 0.01573 s, Size = 9437184 Pixels, NumDevsUsed = 1, Workgroup = 0
Reading back GPU results...
Checking the results...
...running convolutionRowCPU()
...running convolutionColumnCPU()
...comparing the results
...Relative L2 norm: 0.000000E+00
Shutting down...
Test passed
OpenCL example:
$ optirun ./oclConvolutionSeparable
[oclConvolutionSeparable] starting...
clGetPlatformID...
Get the Device info and select Device...
# of Devices Available = 1
Using Device 0: NVS 5400M
# of Compute Units = 2
./oclConvolutionSeparable Starting...
Allocating and initializing host memory...
Initializing OpenCL...
Initializing OpenCL separable convolution...
Loading ConvolutionSeparable.cl...
Creating convolutionSeparable program...
Building convolutionSeparable program...
Creating OpenCL memory objects...
Applying separable convolution to 3072 x 3072 image...
Reading back OpenCL results...
Comparing against Host/C++ computation...
Relative L2 norm: 0.000e+00
[oclConvolutionSeparable] test results...
PASSED
For compilation I didn’t use -Mcuda option and I got this message:
pgcc-Error-CUDA version 5.0 is not available in this installation
I thought previously installed CUDA will be used by PGI compiler.
So I’ve installed PGI compiler again with CUDA installation and the examples work fine!