Multi-dimensional array copyin

I have a 2D matrix SIZE x SIZE, which I’m trying to copy to the GPU.

I allocate the matrix this way:

float (*a)(SIZE) = (float(*)[SIZE]) malloc(SIZE * SIZE * sizeof(float));

And I have this on my ACC region:

#pragma acc data copyin(a[0:SIZE][0:SIZE]

The compiler output is:

Generating copyin(a[0:1024][0:])

What does a[0:1024][0:] mean?

If instead of this, I have only one array with SIZE*SIZE length (and copy everything to the GPU), then the code is about 10x slower to execute.

Any ideas?

Hi lechat,

In OpenACC, only contiguous data blocks of data can be copied. Since SIZE is known and you’re array is contiguous, the compiler is copying the whole array in one large 1-D block.

If instead of this, I have only one array with SIZE*SIZE length (and copy everything to the GPU), then the code is about 10x slower to execute.

I’d need an example of your code to determine why this would be the case.

Hope this helps,
Mat

Hi mkcolg,

thank you for your reply, very useful.

I’m sorry, the code with the “serialized” matrix is not 10x times slower, it was a mistake on my side.

Thank you