Error with nollvm: Unsupported array datatype

Hi. When I compile the folowing code with nollvm, I get this error:

PGC-S-0155-Compiler failed to translate accelerator region (see -Minfo messages): Unsupported array datatype (main.c: 1)

Without nollvm, it does compile, but I will have to use nollvm in the code I’m working on (because I’m using cuRAND lib). Any ideas what the problem is?

The code:

#include <stdio.h>
#include <stdlib.h>

#define N 1000
double max = 100000;
double v[N][N];
#pragma acc declare create(max, v)

#pragma acc routine nohost
void routine(int i, int j) {
        if (v[i][j] > max) v[i][j] = max;
}

int main (int argc, char **argv) {
    for (int i = 0; i < N; i++)
        for (int j = 0; j < N; j++)
            v[i][j] = rand() * max + 5;

    #pragma acc parallel loop collapse(2) copyin(max, v)
    for (int i = 0; i < N; i++)
        for (int j = 0; j < N; j++)
            routine(i, j);

    printf (">>>> %lf %lf %lf %lf\n", v[0][0], v[0][N/4], v[N/4][0], v[N/4][N/4]);
    fflush(stdout);
    return 0;
}

PS: Also, why the error points to main.c:1? There is nothing wrong in line 1…

Hi Matheus Tavares,

Unfortunately, the old CUDA code generator (i.e. nollvm) doesn’t support multidimensional arrays in the “declare” directive.

Can you flatten “v” to be a single dimension (see below), or maybe pass “v” to the routine so you can use a regular data region instead of “declare”?

Note that you’ll also get a redefinition error with “max”, since “max” is defined in the CUDA header files. This is why I changed “max” to “mmax” below.

% cat main.c
#include <stdio.h>
#include <stdlib.h>

#define N 1000
double mmax = 100000;
double v[N*N];
#pragma acc declare create(mmax, v)

#pragma acc routine nohost
void routine(int i, int j) {
        long idx = i*N+j;
        if (v[idx] > mmax) v[idx] = mmax;
}

int main (int argc, char **argv) {
    long idx;
    for (int i = 0; i < N; i++) {
        for (int j = 0; j < N; j++) {
            idx = i*N+j;
            v[idx] = rand() * mmax + 5;
    }}
    #pragma acc update device(mmax,v)
    #pragma acc parallel loop collapse(2)
    for (int i = 0; i < N; i++)
        for (int j = 0; j < N; j++)
            routine(i, j);

    #pragma acc update self(v)
    printf (">>>> %lf %lf %lf %lf\n", v[0], v[N/4], v[(N/4)*N], v[((N/4)*N)+(N/4)]);
    fflush(stdout);
    return 0;
}
% pgcc main.c -ta=tesla:cc70,nollvm -Minfo=accel
routine:
     10, Generating acc routine seq
         Generating Tesla code
main:
     23, Generating update device(v[:],mmax)
         Accelerator kernel generated
         Generating Tesla code
         24, #pragma acc loop gang, vector(128) collapse(2) /* blockIdx.x threadIdx.x */
         25,   /* blockIdx.x threadIdx.x collapsed */
     29, Generating update self(v[:])
% a.out
>>>> 100000.000000 100000.000000 100000.000000 100000.000000

-Mat

Ok, thanks!

Good to know that flattening the array solves the problem! But the code I’m working on has many more 2D arrays, is there any other way to make it work without flattening all? I’m saying that because it would require a lot of refactoring in the code to handle the 1D arrays.

I could use llvm but that won’t work with CUDA libraries such as cuRAND, right? Any other sugestions?

Thanks again for the attention.

EDIT: Ok, I managed to flatten my 2D arrays and its working now, thanks!