I am working on a project of which i can’t post any code. But the project involves training a deep network using cuDNN.

I have defined a network with several layers using cuDNN. The netork get’s two inputs: float* input1 float* input2, and generates two output’s float* output1, float* output2.

The two output’s are used to compute a difference using a custom kernel which i call float* target. This difference i call float* target is then back-propagated to adjust the network weights.

I launch as many threads as the batch size. This is repeated in a infinite loop (pseudo-code):

    output1 = network.forward(input1)
    output2 = network.forward(input2)

The system works when the batch size = 1 (and 1 thread is launched). However if the batch size is greater then 1 (more then 1 thread is launched) the code run’s correctly for some time and then all of the sudden returns a wrong target, and the backprop step fail’s on cudnnConvolutionBackwardBias returning error status CUDNN_STATUS_EXECUTION_FAILED.

The time the code run’s correctly is shorter when the batch size is increased.

I know my problem is cryptic, but might any engineer have any idea of what is going wrong?

Kind regards