How to sync data computed from the api "cudnnBatchNormalizationBackward"

we are trying to train a model on multi-gpu (perhaps 3 or 4), and there is a problem of synchronizing data from different gpus when we call the api “cudnnBatchNormalizationBackward”.
how should we do it, or is there any other api we could call to do batch normalization on multi gpus?