Is it possible to allocate multidimensional arrays on GPU’s global memory?
Actually, I allocate 2 dimensional arrays using linear addressing, hence using just a dimension.
Then, as I need arrays of 2D arrays, I managed to do the following:
-A type a[10] is allocated on host memory ( [10] is static, but you might also consider it dynamic, it’s the same);
-A type **a_device is allocated on the device, in this case with 10sizeof(type ) size;
-For i=0…9 a[i] is allocated with the correct dimension (widthheight of 2D array)
-At last, I copy a’s contents into a_device’s ones, and it works.
Now, for further dimensions this trick is not possible, as I cannot access a_device[n].
Is there a way to allocate multidimensional arrays on device’s memory?
Well, 2D linear addressing is really easy, and so would be 3D, but I am thinking of at least 5.
Such structure unfortunately is needed by the algorithms I am implementing.
I were also thinking of using something like [dim1dim2][[dim3dim4], which would work the way I’m currently doing, but it didn’t worked.
It just returned 0s.
I use 4D arrays in part of my code. Note that coalescing reads starts to get a little complicated in these cases. I use cudaMallocPitch to allocate a “2D” array with width “L” and height MxMyMz. Then I index into the array by doing all the index calculations by hand. Because of my memory access pattern, a single block accesses all elements along the L axis and are coalesced because I used cudaMallocPitch. To get the right elements, I just need to access the array element at index (i*(MzMy) + jMy + k)*pitch + threadIdx.x, where pitch is in elements, not bytes.
Isn’t indexing with so much calculations slow?
I’m trying to achieve really high performance, due to the fact I’m computing algorithms which on a common computer would take seconds.
Calculations are very cheap, and uncoalesced device memory accesses are very expensive. We are talking about a factor of magnitude 100 here. Sometime it’s even cheaper to recalculate a result then fetching it from global memory.
Oh, that’s really interesting.
Thank you for the answers.
I think I will write a few macros for more human readable 2d/3d and 4d linear mapping, and then convert my actual array code into with 3d accessing of elements in a linear form.