NVIDIA_TF32_OVERRIDE=0 not disabling TF32 in cublas

Inspite of setting NVIDIA_TF32_OVERRIDE=0 see the following.

I tensorflow/stream_executor/cuda/cuda_blas.cc:1760] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.

How should TF32 be disabled then ?.

How do you know you’re executing on Tensor Cores?

You could try changing CUBLAS_TF32_TENSOR_OP_MATH here to CUBLAS_DEFAULT_MATH

Thanks for confirming that there is no way to turn off TF32 in CUBLAS without rebuilding TF.

That’s not exactly what I said.

I merely offered a suggestion to verify what you’re seeing.

I’m curious how you know the NVIDIA_TF32_OVERRIDE=0is not working? It’s possible there is a bug.
Can you provide a minimal reproducer?

Ahh, I see. Because the following is logged. Does that line not imply TF32 being used or is it a spurious/fake log ?

“I tensorflow/stream_executor/cuda/cuda_blas.cc:1760] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.”

I think I understand the confusion now.
You’re seeing a runtime log, which is trigger by the fact the data type is float.
If you set NVIDIA_TF32_OVERRIDE=0 doesn’t mean the log record goes away.
You need to profile your code with and without the override code.

You might try utilize cuBLAS logging to see what kernels are being called.

By using CUBLASLT_LOG_LEVEL=5 , only see the following kernels in both NVIDIA_TF32_OVERRIDE=0/1

[cublasLtCreate]
[cublasLtCtxInit]
[cublasLtSSSMatmulAlgoGetHeuristic]
[cublasLtSSSMatmul]

Both seem to use the following

2022-01-12 03:29:14][cublasLt][1420][Api][cublasLtSSSMatmulAlgoGetHeuristic] Adesc=[type=R_32F rows=200 cols=200 ld=200] Bdesc=[type=R_32F rows=200 cols=128 ld=200] Cdesc=[type=R_32F rows=200 cols=128 ld=200] preference=[maxWavesCount=0.0 gaussianModeMask=3M_MODE_DISALLOWED pointerModeMask=0 maxWorkspaceSizeinBytes=4194304 minBytesAlignmentA=16 minBytesAlignmentB=16 minBytesAlignmentC=16 minBytesAlignmentD=16 smCountTarget=108] computeDesc=[computeType=COMPUTE_32F_FAST_TF32 scaleType=R_32F]

This might help tf.config.experimental.enable_tensor_float_32_execution  |  TensorFlow Core v2.7.0