1D access vs 3D access


I use 3D cudaArray for reads with texture fetching, and global memory 1D array for writes. I copy 1D arrays to 3D cudaArray for the next timestep.

I access 3D cuda Array with :

[indent]tex3D(tex_vx, i, j, k)[/indent]

I access 1D global memory array with :

[indent]d_vx[k*(sizex)(sizey) + j(sizex) + i][/indent]

Is this coherent ? ie: do i access the same element ?

Thx for your help.

Your 1D access is in column-major order, and I think CUDA arrays are loaded in row-major order.

Um, what? That’s the correct way to access elements in a 3D volume stored in a 1D array. However, if your access pattern is not really random, then there’s no need to read from a 3D texture. You can read in the same fashion as you’re writing.

Seems Okay.


I use cuda arrays for reads in a 4th order finite difference code, the access pattern is not “random” but quite “dense” and can not be completely coalesced. I use 3D cudda array to benefit from spatial and temporal locality of accesses through the cache mechanism.

To copy into 3D array, I set :

const cudaExtent volumeSize = make_cudaExtent(width, height, depth);

copyParams.srcPtr = make_cudaPitchedPtr((void*)d_vx, volumeSize.width*sizeof(float), volumeSize.width, volumeSize.height);

copyParams.dstArray = ca_vx;


I can’t find in the documentation if the cuda Arrays are row major or column major.

The threads blocks are in column-major order (the thread ID of a thread of index (x,y) is (x + y*sizex))

The C arrays are in row major.

On one hand I guess the column-major order would be more coherent, since thread blocks are column major (to avoid coordinates transformation for each thread accessing an array element), on the other hand, row major order would be more coherent with C (the order the arrays are stored on the host).

some say it’s good, some not, who believe ? :blink:

I usually store global 1D array with a column-major pattern, because when I access my1Darray[ksizexsizey + j*sizex + i] with

int i = blockIdx.x*blockDim.x + threadIdx.x;

int j = blockIdx.y*blockDim.y + threadIdx.y;

int k = blockIdx.z*blockDim.z + threadIdx.z;

accesses are coalesced.

Majority has to do with which spatial index is changing ‘fastest’ as you progress linearly through the data as it is laid out in memory. If we assume that row-major in the context of 3D means x changes fastest, then y, and z changes the slowest, then your last example is actually row-major. Your example with thread indices is also row-major as the x index is changing faster than the y index.

Also, you don’t access memory in a cudaArray directly. You access it via textures, which the prorgamming guide tells you will be spatially coherent. Thus it shouldn’t matter to you how the data is internally stored in an array provided your texture lookups return the expected values.