I have a problem with my neuronal net calculation. the number of threads should be equal to the number of neurons in a layer.
my problem now is: if i have more than 512 neurons, my calculation is failing!
because the cpu-code is working with more than 512 neurons and the gpu up to 512 neurons too, i think the problem might lie in the way a call my kernels.
if i have 1024 neurons/threads to handle do i have to set <<<2,512>>> below? or should <<<1,1024>>> work too?
so what exactly will happen when i choose more threads per block, than available by my device?
for(int i = 0; i < iRuns; i++) {
std::cout<<"Training run: "<<i<<std::endl;
for(int y = 0; y < Weights.GetH(); y++) {
devRunFW <<< 1, Neurons.GetW() >>>
(pNet[y].pNeurons,
pNet[y].pWeights,
pNet[y+1].pNeurons,
Weights.GetD() );
}
devUpdateOutpDelta <<< 1, iInpS >>>
(pNet[Neurons.GetH()-1].pNeurons,
pNet[Neurons.GetH()-1].pErrors,
pOut_dev);
for(int y = Weights.GetH()-1; y >= 0; y--) {
devCalcErrorDelta <<< 1, Neurons.GetW() >>>
(pNet[y].pNeurons,
pNet[y].pWeights,
pNet[y].pErrors,
pNet[y+1].pErrors,
Weights.GetD() );
}
for(int y = Weights.GetH()-1; y >= 0; y--) {
devAdaptWeights <<< 1, Neurons.GetW() >>>
(pNet[y].pNeurons,
pNet[y].pWeights,
pNet[y].pErrors,
pNet[y+1].pErrors,
Weights.GetD(), fLearningRate);
}
}