I noticed the release of CUDA 9.1.128, which includes improved GEMM-performance. Are these improvements mainly for small matrices (i.e., DL) or does LINPACK’s DGEMM profit as well. Where can we get the most up-to-date HPL binary for our Volta 100 cards?
Another question, related to the Student Cluster Competition. We’d like to squeeze as many GFlop/s/W out of the card as possible. As I’m sure you know, while DGEMM has some requirement on memory bandwidth, it is not very high. Currently we get close to 30 GFlop/s/W when using the optimum core frequency. We hope to improve on this value by lowering the clock (and thereby the dissipated power) of the memory system. However, nvidia-smi currently reports only a single supported memory clock:
$ nvidia-smi -q -d SUPPORTED_CLOCKS | grep Memory
Memory : 877 MHz
Is this a restriction imposed by the driver (and if so, can you provide a workaround) or is there simply no hardware support for changing the memory clock.