Why is my graphics card's floating point performance abnormally low?

I test the floating point performance of my GTX1660 SUPER, and the result is only 0.5TFLOPS, which is only 5% of the spec says(About 10TFLOPS). I tried updating the driver and it didn’t work.

And I get a correct performance in gaming, so I think the video card is working fine.

The script I use is as follows.

from transformers import AutoConfig, BertLayer
from torch.utils import benchmark
import pandas as pd
from collections import defaultdict
import inspect
import torch
print('Pytorch version\t:', torch.__version__)
print('CUDA version\t:', torch.version.cuda)
print('GPU\t\t:', torch.cuda.get_device_name())

pd.options.display.precision = 3


def var_dict(*args):
    callers_local_vars = inspect.currentframe().f_back.f_locals.items()
    return dict([(name, val) for name, val in callers_local_vars if val is arg][0]
                for arg in args)


def walltime(stmt, arg_dict, duration=5):
    return benchmark.Timer(stmt=stmt, globals=arg_dict).blocked_autorange(
        min_run_time=duration).median


config = AutoConfig.from_pretrained("bert-large-uncased")
layer = BertLayer(config).half().cuda()
h, b, s = config.hidden_size, 64, 128
X = torch.randn(b, s, h).half().cuda()

print('Dense layer TFLOPS: %.3f' % (8*b*s*h*h / 1e12 / walltime(
    'layer.intermediate.dense(X)', var_dict(layer, X))))
print('Dense+Activation TFLOPS: %.3f' % (8*b*s*h*h / 1e12 / walltime(
    'layer.intermediate(X)', var_dict(layer, X))))
ffn = 16*b*s*h*h / 1e12
print('FFN TFLOPS: %.3f' % (ffn / walltime(
    'layer.output(layer.intermediate(X),X)', var_dict(layer, X))))
att = (4*b*h*s*s + 8*b*s*h*h) / 1e12
print('Attention TFLOPS: %.3f' % (
    att / walltime('layer.attention(X)', var_dict(layer, X))))

And the output is

Pytorch version	: 2.2.1+cu118
CUDA version	: 11.8
GPU		: NVIDIA GeForce GTX 1660 SUPER
Dense layer TFLOPS: 0.581
Dense+Activation TFLOPS: 0.578
FFN TFLOPS: 0.570
Attention TFLOPS: 0.556

Hi @908786794 ,
This forum talks about issues related to cudnn.
I would recommend you to raise it on the concerned forum to get better assistance.

Thanks