Passing shared and local memory arrays to device functions

I want to be able to create the equivalent of the following c function in cuda fortran. I don’t have to do it this way, but it would be nice to know if it is possible to create funcitons that can have shared and local arguments. The 2013 reference guide made it seem like only allocated arrays could be passed to device functions using pointers and that only arguments from global device memory could be used.

Bellow is the c code where a is in local memory, b is an array in shared memory, and c is an array in local memory.

static __device__ void update(double a,double *b, double *c) {
        c[0] += a * b[0];
        c[1] += a * b[1];
        c[2] += a * b[2];
        c[3] += a * b[3];
        c[4] += a * b[4];
        c[5] += a * b[5];
        c[6] += a * b[6];
        c[7] += a * b[7];
        c[8] += a * b[8];
        c[9] += a * b[9];
        c[10] += a * b[10];
        c[11] += a * b[11];
        c[12] += a * b[12];
        c[13] += a * b[13];
        c[14] += a * b[14];
        c[15] += a * b[15];

Hi Mr. Savage,

This will depend on how you’re linking. If you are using RDC (-Mcuda=rdc) with separate compilation and link steps, then all pointers must be global.

If you are using the older method without RDC, i.e. when device functions are in the same module as the caller and get inlined instead of called, then you should be able to passing shared and local variables.

  • Mat