Description
I want to compare the performance of convolutions with TF32 and FP32 on RTX3090, I find that TF32 is no better than FP32. Why?
Environment
TensorRT Version:
GPU Type: GeForce RTX 3090
Nvidia Driver Version: 455.38
CUDA Version: 11.1
CUDNN Version: 8.0.5
Operating System + Version: CentOS Linux release 7.4.1708 (Core)
Python Version (if applicable): 3.6.8
TensorFlow Version (if applicable): 2.4.0
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):
Relevant Files
import tensorflow as tf
import numpy as np
tf.config.experimental.enable_tensor_float_32_execution(False)
x_in = np.array([[
[[2], [1], [2], [0], [1]],
[[1], [3], [2], [2], [3]],
[[1], [1], [3], [3], [0]],
[[2], [2], [0], [1], [1]],
[[0], [0], [3], [1], [2]], ]])
kernel_in = np.array([
[ [[2, 0.1]], [[3, 0.2]] ],
[ [[0, 0.3]],[[1, 0.4]] ], ])
x = tf.constant(x_in, dtype=tf.float32)
kernel = tf.constant(kernel_in, dtype=tf.float32)
out = tf.nn.conv2d(x, kernel, strides=[1, 1, 1, 1], padding='VALID')
Steps To Reproduce
Save the code to a file “test_conv.py”, and execute command “nsys nvprof python3 test_conv.py” in a terminal, you can see the time of every kernel.
Please include:
- Exact steps/commands to build your repro
- Exact steps/commands to run your repro
- Full traceback of errors encountered