I just thought about how to achieve maximum scaling and flexibility, while still offering a simple linear index so that kernels can easily use one final index to address data elements in a unique way in one large 1D array:
// general indexing formula’s:
ThreadWidth = BlockDim.x;
ThreadHeight = BlockDim.y;
ThreadDepth = BlockDim.z;
ThreadArea = ThreadWidth * ThreadHeight;
ThreadVolume = ThreadDepth * ThreadArea;
ThreadIndex = (ThreadIdx.z * ThreadArea) + (ThreadIdx.y * ThreadWidth) + ThreadIdx.x;
BlockWidth = GridDim.x;
BlockHeight = GridDim.y;
BlockDepth = GridDim.z;
BlockArea = BlockWidth * BlockHeight;
BlockVolume = BlockDepth * BlockArea;
BlockIndex = (BlockIdx.z * BlockArea) + (BlockIdx.y * BlockWidth) + BlockIdx.x;
FinalIndex = (BlockIndex * ThreadVolume) + ThreadIndex;
So usage example is:
GridDim( BlocksInXDirection, BlocksInYDirection, BlocksInZDirection ); // XYZ = total blocks
BlockDim( ThreadsInXDirection, ThreadsInYDirection, ThreadsInZDirection ); // XYZ = total threads per block, must be less or equal to threads per block limit (see gpu specs)
// ^ (threads in total = total blocks * total threads per block (same as FinalVolume) )
MemoryCell[ FinalIndex ] = …;
Element[ FinalIndex ] = …;
These are just very big arrays/memories which can be excessed in a 6 dimensional way ! :) thanks to the general indexing formula’s !
So now your problems/solutions should be able to scale up to 6 dimensions as far as the hardware allows and still offer easy 1D array programming ! :)
Let me know how it works out for you !
Also to be able to determine the maximum memory allocation for the memory arrays this is also a handy formula:
FinalVolume = (BlockVolume * ThreadVolume);
malloc/allocate/getmem/etc( …Pointer… , SizeOf(ElementType) * FinalVolume );
Each “InDirection” in the GridDim and BlockDim should always at least be 1, otherwise the calculations will be zero-ed out.
To be able to use these formula’s in an arbitrary way to manipulate and calculate new linear indexes the following function could be used:
// could also be called 6Dto1D
int ThreadX, int ThreadY, int ThreadZ, int BlockX, int BlockY, int BlockZ,
int ThreadWidth, ThreadHeight, ThreadDepth, BlockWidth, BlockHeight, BlockDepth
int ThreadArea = ThreadWidth * ThreadHeight; int ThreadVolume = ThreadDepth * ThreadArea; int ThreadIndex = (ThreadZ * ThreadArea) + (ThreadY * ThreadWidth) + ThreadX; int BlockArea = BlockWidth * BlockHeight; int BlockVolume = BlockDepth * BlockArea; int BlockIndex = (BlockZ * BlockArea) + (BlockY * BlockWidth) + BlockX; int FinalIndex = (BlockIndex * ThreadVolume) + ThreadIndex; return FinalIndex;
Once the 6D to 1D linear address is calculated it can then be converted back to any other multi-dimension.
For example 1D back to 3D for volume rendering:
// could also be called 1Dto3D
void 3DIndexFromLinear( LinearIndex, int &X, int %&Y, int &Z, int Width, int Height, int Depth )
int Area = Width * Height;
Z = LinearIndex / Area; Y = (LinearIndex - (Z * Area)) / Width; X = (LinearIndex - (Z * Area)) - (Y * Width);
FinalIndex = LinearIndex( ThreadIdx.x, ThreadIdx.y, TheadIdxz, BlockIdx.x, BlockIdx.y, BlockIdx.z );
3DIndexFromLinear( FinalIndex, VolumeX, VolumeY, VolumeZ );
Volume3D[ VolumeZ ] [ VolumeY ] [ VolumeX ] = …; // special 3d structure !, pointer array to pointer array to element array.
(All pseudo code untested but in theory should work).