3D pointer support in CUDA

vrnova · March 5, 2009, 7:01pm

Hello, in normal C programming we could define a 3D pointer such as int ***ptr so that we can access a 3D volume in format ptr[z][y]. Can we do this in CUDA? Looks like cudaMalloc3D function still malloc 1D linear memory block…may be I am wrong, please correct me :">

tmurray · March 5, 2009, 7:40pm

That’s not a “3D pointer.” That’s a pointer to a pointer to a pointer to an int. The only thing 3D about it is that you have to dereference three times to get to a value. This is very far from optimal in CUDA (read about memory coalescing to see why), so you get a contiguous 1D array that you can index into to perform 3D lookups.

vrnova · March 5, 2009, 7:59pm

You mean ptr[z][y] is less optimal than ptr[zimageWidthimageHeight + y*imageWidth + x]? These two have same amount of calculation in behind, aren’t they?

Jamie_K · March 5, 2009, 8:09pm

ptr[z][y] requires three memory fetches. Fetching from memory is slow compared to calculating an index.

If you really wanted, you could create your own pointers-to-pointers data structure and initialize it (just like you would have to do in regular C anyway), and prove to yourself that it is slower.

tmurray · March 5, 2009, 8:40pm

Yes, the first is pretty awful. You should probably learn all about how caching on the CPU works and why the first thing is really really bad if you don’t have statically allocated arrays (although even that may depend on your compiler).

Mark_Johnstone · March 5, 2009, 10:29pm

In particular, p*** does not require the memory to be contiguous. If you want to be efficient, you might want to allocate the memory contiguously (for purposes of prefetching, etc.) In some cases, you do not want to allocate the memory contiguously, but that is going to be the exception.

What you really want to do is think about the locality of access of your data, and then optimize the computation to match that.

–Mark