Hello, in normal C programming we could define a 3D pointer such as int ***ptr so that we can access a 3D volume in format ptr[z][y]. Can we do this in CUDA? Looks like cudaMalloc3D function still malloc 1D linear memory block…may be I am wrong, please correct me :">
That’s not a “3D pointer.” That’s a pointer to a pointer to a pointer to an int. The only thing 3D about it is that you have to dereference three times to get to a value. This is very far from optimal in CUDA (read about memory coalescing to see why), so you get a contiguous 1D array that you can index into to perform 3D lookups.
You mean ptr[z][y] is less optimal than ptr[zimageWidthimageHeight + y*imageWidth + x]? These two have same amount of calculation in behind, aren’t they?
ptr[z][y] requires three memory fetches. Fetching from memory is slow compared to calculating an index.
If you really wanted, you could create your own pointers-to-pointers data structure and initialize it (just like you would have to do in regular C anyway), and prove to yourself that it is slower.
Yes, the first is pretty awful. You should probably learn all about how caching on the CPU works and why the first thing is really really bad if you don’t have statically allocated arrays (although even that may depend on your compiler).
In particular, p*** does not require the memory to be contiguous. If you want to be efficient, you might want to allocate the memory contiguously (for purposes of prefetching, etc.) In some cases, you do not want to allocate the memory contiguously, but that is going to be the exception.
What you really want to do is think about the locality of access of your data, and then optimize the computation to match that.