I am working on a project of which i can’t post any code. But the project involves training a deep network using cuDNN.
I have defined a network with several layers using cuDNN. The netork get’s two inputs: float* input1 float* input2, and generates two output’s float* output1, float* output2.
The two output’s are used to compute a difference using a custom kernel which i call float* target. This difference i call float* target is then back-propagated to adjust the network weights.
I launch as many threads as the batch size. This is repeated in a infinite loop (pseudo-code):
while(true)
{
output1 = network.forward(input1)
output2 = network.forward(input2)
kernel<<<1,batch_size>>>(output1,output2,target)
network.backprop(target)
}
The system works when the batch size = 1 (and 1 thread is launched). However if the batch size is greater then 1 (more then 1 thread is launched) the code run’s correctly for some time and then all of the sudden returns a wrong target, and the backprop step fail’s on cudnnConvolutionBackwardBias returning error status CUDNN_STATUS_EXECUTION_FAILED.
The time the code run’s correctly is shorter when the batch size is increased.
I know my problem is cryptic, but might any engineer have any idea of what is going wrong?
Kind regards