Hello,

I just thought about how to achieve maximum scaling and flexibility, while still offering a simple linear index so that kernels can easily use one final index to address data elements in a unique way in one large 1D array:

// general indexing formula’s:

ThreadWidth = BlockDim.x;

ThreadHeight = BlockDim.y;

ThreadDepth = BlockDim.z;

ThreadArea = ThreadWidth * ThreadHeight;

ThreadVolume = ThreadDepth * ThreadArea;

ThreadIndex = (ThreadIdx.z * ThreadArea) + (ThreadIdx.y * ThreadWidth) + ThreadIdx.x;

BlockWidth = GridDim.x;

BlockHeight = GridDim.y;

BlockDepth = GridDim.z;

BlockArea = BlockWidth * BlockHeight;

BlockVolume = BlockDepth * BlockArea;

BlockIndex = (BlockIdx.z * BlockArea) + (BlockIdx.y * BlockWidth) + BlockIdx.x;

FinalIndex = (BlockIndex * ThreadVolume) + ThreadIndex;

So usage example is:

call/launch example:

GridDim( BlocksInXDirection, BlocksInYDirection, BlocksInZDirection ); // X*Y*Z = total blocks

BlockDim( ThreadsInXDirection, ThreadsInYDirection, ThreadsInZDirection ); // X*Y*Z = total threads per block, must be less or equal to threads per block limit (see gpu specs)

KernelLaunch<<<GridDim,BlockDim>>>

// ^ (threads in total = total blocks * total threads per block (same as FinalVolume) )

Inside kernel:

MemoryCell[ FinalIndex ] = …;

Element[ FinalIndex ] = …;

These are just very big arrays/memories which can be excessed in a 6 dimensional way ! :) thanks to the general indexing formula’s !

So now your problems/solutions should be able to scale up to 6 dimensions as far as the hardware allows and still offer easy 1D array programming ! :)

Let me know how it works out for you !

Also to be able to determine the maximum memory allocation for the memory arrays this is also a handy formula:

FinalVolume = (BlockVolume * ThreadVolume);

Usage example:

malloc/allocate/getmem/etc( …Pointer… , SizeOf(ElementType) * FinalVolume );

Additional note:

Each “InDirection” in the GridDim and BlockDim should always at least be 1, otherwise the calculations will be zero-ed out.

To be able to use these formula’s in an arbitrary way to manipulate and calculate new linear indexes the following function could be used:

// could also be called 6Dto1D

int LinearIndexFrom6D

(

int ThreadX, int ThreadY, int ThreadZ, int BlockX, int BlockY, int BlockZ,

int ThreadWidth, ThreadHeight, ThreadDepth, BlockWidth, BlockHeight, BlockDepth

)

{

```
int ThreadArea = ThreadWidth * ThreadHeight;
int ThreadVolume = ThreadDepth * ThreadArea;
int ThreadIndex = (ThreadZ * ThreadArea) + (ThreadY * ThreadWidth) + ThreadX;
int BlockArea = BlockWidth * BlockHeight;
int BlockVolume = BlockDepth * BlockArea;
int BlockIndex = (BlockZ * BlockArea) + (BlockY * BlockWidth) + BlockX;
int FinalIndex = (BlockIndex * ThreadVolume) + ThreadIndex;
return FinalIndex;
```

}

Once the 6D to 1D linear address is calculated it can then be converted back to any other multi-dimension.

For example 1D back to 3D for volume rendering:

// could also be called 1Dto3D

void 3DIndexFromLinear( LinearIndex, int &X, int %&Y, int &Z, int Width, int Height, int Depth )

{

int Area = Width * Height;

```
Z = LinearIndex / Area;
Y = (LinearIndex - (Z * Area)) / Width;
X = (LinearIndex - (Z * Area)) - (Y * Width);
```

}

Example usage:

FinalIndex = LinearIndex( ThreadIdx.x, ThreadIdx.y, TheadIdxz, BlockIdx.x, BlockIdx.y, BlockIdx.z );

3DIndexFromLinear( FinalIndex, VolumeX, VolumeY, VolumeZ );

Volume3D[ VolumeZ ] [ VolumeY ] [ VolumeX ] = …; // special 3d structure !, pointer array to pointer array to element array.

(All pseudo code untested but in theory should work).