CUDNN Batchnorm Backward result is not correct? Found big difference than CPU result.

houqing · October 19, 2018, 8:47am

We observed that the Batch Normal Backward calculation between CPU and CUDNN has big difference.
Can CUDNN expert share some information about the difference between CUDNN and CPU for Batch Normal Backward calculation?

The related call to CUDNN is “wrap::cudnnBatchNormalizationBackward()” (tensorflow-1.8.0/tensorflow/stream_executor/cuda/cuda_dnn.cc:Line-3138)

We are using following resources:

cuda-9.0.176.3
cudnn-9.0-linux-x64-v7 (libcudnn.so.7.0.5)
Nvidia TITAN V (V100), with nvidia-driver-x86_64-390.77.run (on Ubuntu-16.04)
tensorflow-gpu-1.8.0

Please check below for the detailed background:

When evaluating batch normal BACKWORD float16 calculation with CUDNN(Volta Tensor Core) and CPU:
We firstly dump a data set from a real model run (with CUDNN) then feed into following four standalone tensorflow scripts and got four outputs:

CUDNN-with-input-float16, output-float16
CUDNN-with-input-float32, output-float32-then-convert-to-float16
CPU-with-input-float16, output-float16
CPU-with-input-float32, output-float32-then-convert-to-float16

We expect that the four float16 outputs are very close and no big difference, but the real finding is:

the output 1 and 2 has no difference with the real reference data, (which proves the data is correct)
there’s no difference between output 1 and 2
there’s no difference between output 3 and 4
difference between 3(or 4) and 1(or 2) is unexpectedly big (average relative diff is about 67%)

Our concerns are:

why the difference is such big between 3(or 4) and 1(or 2)?
does the difference has any side effect to the convergence for models using CPU-float32 calculation?

mefonixim · October 23, 2018, 9:29pm

…

houqing · October 24, 2018, 3:39am

Could any CUDNN expert share any info about the difference between CUDNN and CPU calculation for Batch Normal backward?

Topic		Replies	Views
cudnnBatchNormalizationForwardInference results are different from Tensorflow cuDNN	1	614	March 11, 2020
maybe a cudnn bug cuDNN	2	686	April 27, 2018
Tanh Activation Backward Precision? cuDNN	1	657	April 29, 2020
Batch normalization implementation using cuDNN cuDNN cuda , cudnn	1	1986	November 20, 2020
ConvNet Batch Normalization with cudNN GPU-Accelerated Libraries	0	1223	April 17, 2017
How to sync data computed from the api "cudnnBatchNormalizationBackward" cuDNN	0	405	September 16, 2019
cudnnBatchNormalizationForwardTrainingEx CUDNN_STATUS_NOT_SUPPORTED cuDNN cudnn	0	105	July 4, 2024
cudnnNormalizationForwardTraining() vs. cudnnBatchNormalizationForwardTraining cuDNN	1	613	March 30, 2023
cudnnBatchNormalizationForwardTraining Results in batchNormOutputTensor with Same Large Negative Double cuDNN	2	2264	February 3, 2020
cuDNN: Need training (backward propagation) sample code GPU-Accelerated Libraries	3	3321	June 28, 2015

CUDNN Batchnorm Backward result is not correct? Found big difference than CPU result.

Related topics