If I have a code which takes struct variable as input and manipulate it’s elements, how can I parallelize this using CUDA?
void BackpropagateLayer(NET* Net, LAYER* Upper, LAYER* Lower)
{
INT i,j;
REAL Out, Err;
for (i=1; i<=Lower->Units; i++) {
Out = Lower->Output[i];
Err = 0;
for (j=1; j<=Upper->Units; j++) {
Err += Upper->Weight[j][i] * Upper->Error[j];
}
Lower->Error[i] = Net->Gain * Out * (1-Out) * Err;
}
}
Where NET and LAYER are structs defined as:
typedef struct { /* A LAYER OF A NET: /
INT Units; / - number of units in this layer /
REAL Output; /* - output of ith unit /
REAL Error; /* - error term of ith unit /
REAL* Weight; /* - connection weights to ith unit /
REAL* WeightSave; /* - saved weights for stopped training /
REAL* dWeight; /* - last weight deltas for momentum /
} LAYER;
typedef struct { / A NET: /
LAYER* Layer; /* - layers of this net /
LAYER InputLayer; /* - input layer /
LAYER OutputLayer; /* - output layer /
REAL Alpha; / - momentum factor /
REAL Eta; / - learning rate /
REAL Gain; / - gain of sigmoid function /
REAL Error; / - total net error */
} NET;
What I could think of is to first convert the 2d Weight into 1d. And then send it to kernel to take the product or just use the CUBLAS library. Any suggestions?