I ran a simple python code which is using keras tensorflow-gpu and compared the results of bhn25 (A100 - your machine) with another P100 machine from ISE faculty.
One of the main problem is 20 minutes for execution of the next lines:
2021-04-05 12:34:13.781180: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-04-05 12:54:17.180852: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
After GPU performance is excellent.