Hello,
I always get a runtime error when I execute my program , I think the error is caused by the dynamic loading of two shared objects which include openACC code. I’m dealing with this problem for a few days but I can’t find my mistake.
I’m using pgcc 14.7 to compile openAcc code and create the shared objects for a Nvidia kepler GPU.
Here is the output of the program:
====ServiceTester====
SourceImage: lenna.tiff
TargetImage: lenna.png
Algorithm: service_dim
Resource: gpu
Load Library: libservice_dim_gpu.so
Load Library: libservice_grey_gpu.so
====Processing====
Read source image: lenna.tiff width: 512 height: 512
Load Function: run_service_dim_gpu
Executed run_service_dim_gpu(). Status Code = 0
Wrote target image: lenna.png
====Processing====
Read source image: lenna.tiff width: 512 height: 512
Load Function: run_service_grey_gpu
call to cuModuleGetFunction returned error 500: Not found
At first both llibraries are opened with dlopen as you can see in the following example:
void *handler = dlopen(libservice_dim_gpu.so, RTLD_LAZY| RTLD_GLOBAL);
void *handler2 = dlopen("libservice_grey_gpu.so", RTLD_LAZY | RTLD_GLOBAL );
This works fine and in the next step I’m loading the function from the first shared object with dlsym():
run_service = (run_service_t) dlsym(handler,"run_service_dim_gpu");
runservice_t is defined as
typedef int32_t (*run_service_t)(PixelPacket *, PixelPacket *, int32_t, int32_t);
As you can see in the output above everything works fine, I can call the function without any problems.
But then I do the same for the function in the second shared object:
run_service = (run_service_t) dlsym(handler,"run_service_grey_gpu");
and call the function with:
statuscode=run_service(pixpack_target, pixpack_source, rows, columns);
Now I just get the error message:
call to cuModuleGetFunction returned error 500: Not found
At the end I’m closing the libraries but I never reach this point
closeLibrary(&handler);
closeLibrary(&handler2);
I’ve already tried to open the first library, call the function and close the first library, before I open the second library and call the second function.
This works, but later on I want to open more than only 2 libraries and execute all the functions in a random order. So it would need unnecessary much time to close a library and open it later again.
Here is one example for the OpenACC code, the other function is very similar:
/*service_grey_gpu.c*/
#include "service_grey.h"
int32_t run_service_grey_gpu(PixelPacket *pixpack_target, PixelPacket *pixpack_source, int32_t rows, int32_t columns) {
//Tranform RGB to Grey Picture
int32_t pos,i,j;
#pragma acc kernels copyin(pixpack_source[0:columns*rows]) copyout(pixpack_target[0:columns*rows])
{
#pragma acc loop independent
for (i=0; i<rows; ++i) {
#pragma acc loop independent private(pos)
for (j=0; j<columns; ++j) {
pos = i*columns+j;
pixpack_target[pos].red = (pixpack_source[pos].red+pixpack_source[pos].green+pixpack_source[pos].blue)/3;
pixpack_target[pos].green = (pixpack_source[pos].red+pixpack_source[pos].green+pixpack_source[pos].blue)/3;
pixpack_target[pos].blue = (pixpack_source[pos].red+pixpack_source[pos].green+pixpack_source[pos].blue)/3;
pixpack_target[pos].opacity = 0;
}
}
}
return 0;
}
I’m using the following commands to create the shared objects:
pgcc -acc -ta=tesla,cuda6.0,keep -O0 -Minfo -fPIC -c -o service_grey_gpu.o ../csrc/service_grey/service_grey_gpu.c
pgcc -acc -ta=tesla,cuda6.0,keep -O0 -Minfo -shared -o libservice_grey_gpu.so service_grey_gpu.o
There are no errors or warnings during the compiling, her is the output:
run_service_grey_gpu:
8, Generating copyin(pixpack_source[:rows*columns])
Generating copyout(pixpack_target[:rows*columns])
Generating Tesla code
11, Loop is parallelizable
13, Loop is parallelizable
Accelerator kernel generated
11, #pragma acc loop gang /* blockIdx.y */
13, #pragma acc loop gang, vector(128) /* blockIdx.x threadIdx.x */
pgcc -acc -ta=tesla,cuda6.0,keep -O0 -Minfo -shared -o libservice_grey_gpu.so service_grey_gpu.o
Also I’ve tried to compile the shared objects without the -acc flag, so that I’m running the functions on the CPU. This works good but I want to use the GPU.
So my assumption is that there are some issues with the loading of the libraries and the GPU execution.
Hopefully someone has an idea why this isn’t working.
Chris