How do you allocate 4D or 5D arrays on gpu?

We can use cudaMalloc3D for 3D arrays, but what about 4D (and more) arrays?
Sould I use something like this :

// Host code
cudaExtent extent = make_cudaExtent(DIM_1*sizeof(float), DIM_2, <b>DIM_3*DIM_4</b>);
cudaPitchedPtr devPitchedPtr;
cudaMalloc3D(&devPitchedPtr, extent);
MyKernel(devPitchedPtr, DIM_1, DIM_2, <b>DIM_3*DIM_4</b>);

// Device code
__global__ void MyKernel(cudaPitchedPtr devPitchedPtr,
int dim_1, int dim_2, int dim_3)
	char* devPtr = devPitchedPtr.ptr;
	size_t pitch = devPitchedPtr.pitch;
	size_t slicePitch = pitch * dim_2;
	for (int z = 0; z < dim_3; ++z) {
		char* slice = devPtr + z * slicePitch;
		for (int y = 0; y < dim_2; ++y) {
			float* row = (float*)(slice + y * pitch);
			for (int x = 0; x < dim_1; ++x) {
				float element = row[x];

What do you suggest?

Thanks for your reply

The easiest way to allocate an N-D array is to allocate a 1d array and handle address calculation yourself.

int myAddress = l*(sizex*sizey*sizez) + k*(sizex*sizey) + j*(sizex) + i;

and along those lines.

I would to keep the memory alignment. Finaly, I can do this with cudaMallocPitch :

cudaMallocPitch(&devPtr, &pitch_1, dim_1*sizeof(float), dim_2*dim_3*dim_4);   // host
float* Val = (float*)((char*)devPtr + l*(dim_3*dim_2*pitch_1) + k*(dim_2*pitch_1) + j*(pitch_1)) + i;  // device

I don’t understand the usefulness of cudaMalloc3D…
Anyway, thank you

if you’d like to use 4d or 5d or 6d or whatever you want d take a look at the hypergrid I posted here.
If you’ve problem to retrieve it I can provide it for you.