I want to be able to create the equivalent of the following c function in cuda fortran. I don’t have to do it this way, but it would be nice to know if it is possible to create funcitons that can have shared and local arguments. The 2013 reference guide made it seem like only allocated arrays could be passed to device functions using pointers and that only arguments from global device memory could be used.
Bellow is the c code where a is in local memory, b is an array in shared memory, and c is an array in local memory.
static __device__ void update(double a,double *b, double *c) {
c[0] += a * b[0];
c[1] += a * b[1];
c[2] += a * b[2];
c[3] += a * b[3];
c[4] += a * b[4];
c[5] += a * b[5];
c[6] += a * b[6];
c[7] += a * b[7];
c[8] += a * b[8];
c[9] += a * b[9];
c[10] += a * b[10];
c[11] += a * b[11];
c[12] += a * b[12];
c[13] += a * b[13];
c[14] += a * b[14];
c[15] += a * b[15];
}