Hi there, I’ve been going through the NVIDIA CUDA programming guide, best practices guide, and the forum trying to find out how to allocate a 3d array and get to access it using indecies just like in c++ (array[i][j][k]), that’s because I have a 6000 lines of code that I’ve converted from matlab to c++, and now I have to get it working on the GPU (EVGA GeForce GTX 285), 99% of the code is loops around 3d arrays, here is an example:

for( k=P1;k<=KP;k++)

for ( j=P1;j<=JP;j++)

for ( i=3;i<=I_1;i++){

Hx[i][j][k]=dax[i][j][k]*Hx[i][j][k]+
dx1[i][j][k]*(Ey[i][j][k+1]-Ey[i][j][k]-

Ez[i][j+1][k]+Ez[i][j][k])+

db2x[i][j][k]

*(Ey[i][j][k+2]-Ey[i][j][k-1]-*

Ez[i][j+2][k]+Ez[i][j-1][k])+

dx3[i][j][k](Ey[i][j+1][k+2]+Ey[i][j-1][k+2]+

Ez[i][j+2][k]+Ez[i][j-1][k])+

dx3[i][j][k]

Ey[i+1][j][k+2]+Ey[i-1][j][k+2]-

Ey[i][j+1][k-1]-Ey[i][j-1][k-1]-

Ey[i+1][j][k-1]-Ey[i-1][j][k-1]-

Ez[i+1][j+2][k]-Ez[i-1][j+2][k]-

Ez[i][j+2][k+1]-Ez[i][j+2][k-1]+

Ez[i+1][j-1][k]+Ez[i-1][j-1][k]+

Ez[i][j-1][k+1]+Ez[i][j-1][k-1])+

dx4[i][j][k]*(Ey[i+1][j+1][k+2]+Ey[i-1][j+1][k+2]+

Ez[i+1][j-1][k-1]+Ez[i-1][j-1][k-1]);

} ///////////////////////////////////////

I have like 500 loops in this fashion, I am very confused, I don’t know If I should use cuda arrays (which are fetched only by using textures as I read in the guide, while my arrays are read/write arrays), or use cudaMalloc (which I think will fail when copying from the host since I’ll be copying pointers!)

This is the function I used to allocate and initialize the arrays in c++, which I think can’t be done on CUDA unless I do it directly on the GPU (i.e. don’t allocate on the host then copy to the GPU), but I dont know if this can be done.

void Malloc3dVal (float ***arr3d,int x,int y,int z, float val){

```
int i,j,k;
if(arr3d==NULL){
cout<<"Memory allocation failed. Exiting...."<<endl;
return;
}
for( i=0; i<x; i++ )
{
arr3d[ i ] = (float **)malloc(y* sizeof (**arr3d));/* Allocate 'y' number of pointers to int */
if(arr3d[i]==NULL){/*Validate malloc's success/failure using the return value*/
cout<<"Memory allocation failed. Exiting...."<<endl;
return;
}
for( j=0; j<y; j++)
{
arr3d[ i ][ j ]= (float *)malloc(z* sizeof (***arr3d));/* Allocate 'z' number of ints */
if(arr3d[ i ][ j ]==NULL){ /*Validate malloc's success/failure using the return value*/
cout<<"Memory allocation failed. Exiting...."<<endl;
return;
}
for( k=0; k<z;k++){
arr3d[i][j][k]=val;
}
}
}
```

}

////////////////////////////////////////////////////////////////

Or maybe I should convert all 3d to big 1d arrays and do some index calculation (which means I have to edit the whole 6000 lines again).

Another issue is that as mentioned above I have almost 500 loops, they depend on each other, but each iteration in a loop is independent from the previous one, should I make 500 kernels!!! each loop is different than the other, I mean I can’t make a kernel that can replace many loops!!

Help is highly appreciated, Thank you all in advance

note: I am still waiting for the GPU to arrive by the DHL, that’s why i couldn’t try the options myself, plus it may arrive by next monday, and my boss wants me to get working on the code till then. I tried to install the CUDA driver to use the emulator but it gave me an error.