Advection field in 3d texture


I need to move around one density field from another vector field all in 3d. I’d like to do this with 3D textures to utilize hardware interpolations.

The problem right now is getting the data from the right format on to the device.

On the host I have the vector field as a linear array, so for a 2x2x2 field it would look like

| 0.x | 0.y | 0.z | 1.x | 1.y | 1.z | 2.x | 2.y | 2.z | 3.x | 3.y | 3.z | 4.x | 4.y | 4.z | 5.x | 5.y | 5.z |…

textures don’t support float 3 though so I need to copy the data to float4 values. Copying the above array give a wrong result as there’s only three values for each vector. Is there any way to copy these values to a 3d texture with float4 in it and set the last element to 0 in each vector?

The corresponding thing from linear host memory to linear device memory would be:

cudaMemcpy2D( d_Ptr, 4*sizeof(float), h_Ptr, 3*sizeof(float), 3*sizeof(float), fieldWidth * fieldHeight * fieldDepth, cudaMemcpyHostToDevice);

The same array as above would then look like:

| 0.x | 0.y | 0.z | 0 | 1.x | 1.y | 1.z | 0 | 2.x | 2.y | 2.z | 0 | 3.x | 3.y | 3.z | 0 | 4.x | 4.y | 4.z | 0 | 5.x | 5.y | 5.z | 0 |…

Although I know how to copy memory from and to textures/arrays by looking at sdk examples I feel like I haven’t really grasped how it really works and am kind of stumbling around atm.

If someone know of a good way to copy and realigning the values in one call instead of realigning memory first and then copy, I’d be really glad to hear it.

Or if the array representation is linear just like above then maybe someone know how I can just bind the array to that piece of memory?


For the moment I’m just realigning the momery myself so if anyone know of a better way, I’d be pleased to hear it :)

I have another question though. The Kernel use for the advection “crashes” when the density fields get too big. It doesn’t really crash as the program keeps executing but the values returned from the gpu are all zeroes. I guess it’s because I overstep the global device memory or something as it works in emulation mode but not otherwise. Although I think it “crashes” too early for it to be global memory.

Is there any way to check why it crashes? I’ve tried to put CUT_CHECK_ERROR(“kernel failed”) after the kernel call and CUDA_SAFE_CALL() around all other cudaFunctions. Don’t know if this is the way to do it though.


CUT_CHECK_ERROR and CUDA_SAFE_CALL only check for errors in debug builds.

Ah, I see, thanks…

To another note though. Do anyone know how much space an 3d array and texture takes in global memory?

For a program I’m allocating two 3d textures (one with float and one with float4) where one of the textures is a density field and the other a vector field to push the densities around. I also have a result array in linear memory of the same size as the density field.

If I allocate all fields with a size of 240x240x240 everything runs smoothly. But if I change to 250x250x250 it runs out of memory on the last cudaMalloc for the resulting field.

I run on a Quadro FX 3700 which means about 512MB of global memory. If I were to have everything in linear memory the program would need about:

( 250 x 250 x 250 * 16 + 250 x 250 x 250 * 4 *2 ) / 1024 / 1024 = 300Mb of memory (disregarding any other junk and small constant memory pieces allocated)

Can the Cuda arrays really be occupying the last 200 MB of memory or am I doing something wrong?

Typically, CUDA eats up ~50 MiB. If this GPU is running a display (especially one with a fancy compositing desktop) that usage can be higher. You can check how much memory is free with cuMemGetInfo (see any of a number of recent threads on this topic).

Ah, thanks a lot. Been looking for a way to see available Cuda memory. I kinda started with Cuda just recently so haven’t dared to wander into the driver api yet but I will definitely check it out. Can’t try it out right now as I ain’t got access to that computer right now. Thanks for the help.

Don’t worry about the fact that cuMemGetInfo is a driver API call. It is one of the few driver API calls that can be made from a runtime API program without any problems.

I’ve been programming CUDA since the first public release in March 2007 and I’ve never touched the driver API, except for this one call. The runtime API is just too convenient :)

Ah, that sounds great, wasn’t really sure how to mix the driver and runtime api but didn’t seem to be any problem so far with that one as you mentioned. Thanks alot for the tip, really helps me out.