I want to overlay or map multidimensional arrays on one-dimensional memory inside my CUDA kernels. Yesterday I was surprised to find out that in C it can be made using a typedef operator
typedef int row_t[N];
Then, after defining matrix as
row_t *matrix = (row_t *)arr1;
I can access a 1D array arr1 as 3D
matrix[i][j][k]
Please, see the post “Casting between 1D and 2D arrays” at
http://bytes.com/topic/c/answers/845647-casting-between-1d-2d-arrays
I also found the description of the typedef standard in “The C Book” at
http://publications.gbdirect.co.uk/c_book/chapter8/typedef.html
I pass Numpy arrays to GPU from Python. They are multidimensional, and the dimensions may vary dependent on the number of threads and blocks involved. This ensures the scaling flexibility without the CUDA code recompiling. The CUDA code below is absolutely meaningless, but it shows the principle: the float aa array is treated as 3D array float matr[L][M][N].
__global__ void work_on_array(float *arr1) {
#define L threadIdx.x
#define M threadIdx.y
#define N blockIdx.x
typedef float mat_t[M][N];
int i, j, k;
/* Make arrays arr1[] and matr[L][M][N] overlay */
mat_t *matr = (mat_t *) arr1;
for (i=0; i < L; ++i)
for (j=0; j < M; ++j)
for (k=0; k < N; ++k)
matr[i][j][k] = i*j*k;
}
However, the nvcc compiler does not let it. The error message referring to the typedef operator is
“error: this operator is not allowed in an integral constant expression”
If I try to use just variable dimensions, like “int L=5, M=7, N=4”, then the message is
“error: expression must have a constant value”
By the way, my gcc 4.4.3 allows expressions like “typedef float mat_t[M][N];”, where M and N are variables. This is EXTREMELY convenient and useful in some cases. The code below compiles and works in Linux (you are encouraged to try):
#include <assert.h>
#include <stdio.h>
void compare(int *array, int N, int M)
{
typedef int mat_t[N][M];
int i, j, k;
mat_t *matrix = (mat_t *)array; /* 3D array [any][N][M] */
for (i=0; i < N; ++i)
for (j=0; j < N; ++j)
for (k=0; k < M; ++k)
{
assert( &matrix[i][j][k] == &array[(i*N+j)*M+k] );
assert( matrix[i][j][k] == array[(i*N+j)*M+k]);
printf("matrix[%d][%d][%d]=%d\n", i, j, k, matrix[i][j][k]);
}
}
int main(void)
{
int i, N=9, M=5;
int foo[N*N*M];
for (i=0; i < N*N*M; ++i) foo[i] = (7*i+11)*3+8;
compare(foo, N, M);
return 0;
}
Does anybody know if such array overlaying is in principle possible in CUDA C/C++ ?
Does NVIDIA plan to upgrade nvcc to allow variable bounds in typedef float mat_t[M][N]?
Thank everybody in advance.