Update certain areas of a CUDA 3D array


is there a convenient way to download a smaller CUDA 3D array from the host to a specific position in a larger CUDA 3D array on the device? I need CUDA arrays for interpolation, so they are bound to a 3D texture. The 3D device array shall be a memory pool for exchangable 3D data of smaller extent (the extent is of course always the same). It would be okay if the datasets would pile up in only one direction in the pool. Updates only occur between kernel runs, but there could be many of them. I thought about using a second kernel and surface memory for this, but i’m not sure if interpolation is supported with this, if it is the right way to go and additionally this is only available for capabilities 2.0 and up - but i’m on 1.2, so i would prefer a solution for this. Also, when using a second kernel for data management of a CUDA 3D array, i guess one would have to deal with data reordering according to some kind of space filling curve, which is probably used to enhance locality of the texture caches.
Your thoughts and recommendations are highly appreciated :smile:


ArrayFire’s N-dimensional subscripting makes this really simple (and performs very quickly) (probably 3-4 lines of code). You can pass device pointers in/out of ArrayFire too, so it’ll incorporate well into your other code.

Here’s the link of interest for subscripting: http://www.accelereyes.com/arrayfire/c/group__indexing.htm

Hi! Just to be sure, since it wasn’t obvious to me after a first quick read. You mean i can allocate a 3D array on the device with, for example, extent (n,5,5) and then copy a 3D array with extent (5,5,5) from the host (possibly pinned memory) to the device to position (5,0,0) of the device array and i can read it on the device as usual with three-dimensional adressing through the texture reference by adding the offset and the SFU is used for trilinear interpolation?

You can’t copy host contents directly into an offset location on the GPU. You can, however, do the following:

float h_a[5 * 5 * 5];

// Assign some values to h_a;

array a(5, 5, 5, h_a); // does the memcopy to host.

array b = zeros(n, 5, 5);  // where n > 5

b(seq(0, 4), span, span) = a; // if you want a starting from 0

b(seq(5, 9), span, span) = a; // if you want a starting from 5

I see, but what’s still not clear to me is, does this address the correct layout for 3-dimensional textures? Since it has to respect alignment and maybe fit its addressing to space filling curves to generate locality?

ArrayFire does contain functions for interpolation. But each of ArrayFire’s functions uses the various kinds of GPU memory internally, and there are not hooks for you to change that. You can always grab device pointers from any ArrayFire “array” and use them elsewhere in custom code that implements textures.

Since 3D surface memory is avaiable with new CUDA versions, maybe this approach is possible (very similar to C. Crassin’s GPU cache in GigaVoxels)?

  1. allocate a large 3D CUDA array as memory pool
  2. create a texture reference which meets the requirements (extent matches the complete memory pool, linear interpolation, 16-bit floating-point texture) and bind it to the array
  3. create a surface reference (extent also matches the complete memory pool) and bind it to the array
  4. use a dedicated kernel, which has access to the required data of arbitrary size, to copy it into some arbitrary 3D location of the array by using the surface reference
  5. run the main kernel and access the array by using the texture reference (texture cache is cleared on new kernel runs)