Multi-dimensional arrays in global memory

rolfe · August 7, 2008, 9:24pm

I would like to create a three-dimensional array with dimensions of inconsistent sizes. Most of the documentation for multi-dimensional arrays advocates creating a one-dimensional array of size lengthwidthheight and manually calculating the correct index. This would be unacceptably inefficient in my application, as I have a small number of very large sub-arrays, and a large number of much smaller sub-arrays. If I were to try to allocate an array with consistent dimensions large enough to fit all of the sub-arrays, I wouldn’t have enough memory.

On the CPU, I create an array of (float**), and link each element of this array to an array of (float*), and link each element of these arrays to an array of floats. cudaMalloc only accepts pointers to (void*) arrays, but I presume all pointers should be of the same size, so I should be able to cast to (void*) safely.

My question really pertains to the linking of all these arrays together. This seems like a serial task best performed on the CPU. Can I cudaMalloc all of my arrays separately, link them together on the CPU, and then cudaMemcpy then onto the GPU? In particular, what is the meaning of the value returned by cudaMalloc()? Can I assign the values returned by cudaMalloc for the sub-arrays to the entries of the highest-level array on the CPU, cudaMemcpy the highest level array to the GPU, and expect the structure to be correct in the global memory of the device? The documentation indicates that the GPU uses normal 32-bit pointers, so this seems like a reasonable thing to do, assuming that cudaMalloc just returns 32-bit addresses into the global memory, but perhaps I am confused.

Thanks,
Jason

MisterAnderson42 · August 7, 2008, 11:24pm

Well, the pointers are the same size as the pointers on the host: so they will be 64-bit on 64-bit platforms. But that is just a side issue.

You’ve got it right: cudaMalloc just returns a pointer into the device memory. So you can allocate a float** list of pointers on the device, allocate a float** list of pointers on the host, cudaMalloc every pointer into the list of pointers on the host and copy the list of pointers to the device. The compiler will tell you “warning, can’t tell what pointer points to: assuming global memory space” when you dereference your memory, but that is OK because you have allocated global memory after all.

You’ll have to be careful with your memory access pattern to get coalesced reads since your pointers are now being read from global memory too. How will your threads access the list? If a block takes one of the float* and does stuff with it, then you can have threadIdx.x==0 load in the float* into shared memory from that float** and have the block work on that.

rolfe · August 11, 2008, 3:20pm

Thanks. Sounds like everything should work out just ducky, then. By the way, do you know why cudaMalloc takes in a (void**) and alters the content of the provided address rather than just returning a (void*) like normal malloc()? Given that the two would seem to have the same overall effect, it seems odd to deviate from the standard C usage.

Jason

MisterAnderson42 · August 11, 2008, 3:46pm

Because every cuda* function (except cudaGetErrorString) returns a cudaError_t indicating whether the function call was successful or not and what the error was. It must have been a design choice that the developers made to have completely consistent error handling across all cuda functions so that macros such as CUDA_SAFE_CALL can be used for error checking.

Topic		Replies	Views
Multidimensional array, cudaMalloc CUDA Programming and Performance	1	7218	December 8, 2008
How can I call the global function when I use a 3d array CUDA Programming and Performance cuda	12	706	October 12, 2021
Problems with creating an array of Cuda pointers CUDA Programming and Performance	7	13721	April 20, 2009
Multi-GPU array CUDA Programming and Performance	2	666	June 4, 2021
How to define a three-dimensional array? define a three-dimensional array on GPU CUDA Programming and Performance	13	11959	October 10, 2008
3D arrays CUDA Programming and Performance	3	5056	March 26, 2008
Allocating a multidimensional array onto a device variable CUDA Programming and Performance	6	1680	July 15, 2015
Using cudaMalloc to allocate a quasi-2D array CUDA Programming and Performance	2	3540	July 13, 2011
allocating double pointer memory in GPU CUDA Programming and Performance	3	11884	February 3, 2011
Strange memory gremlins Getting pwned by pointers CUDA Programming and Performance	9	12263	July 1, 2009

Multi-dimensional arrays in global memory

Related topics