Hi there,
I found in this link
https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#tile-quant, figure 7 shows a Tile quantization effect with NVIDIA A100-SXM4-80GB, CUDA 11.2, cuBLAS 11.4. and says it is Measured with a function that forces the use of 256x128 tiles over the MxN output matrix. However, as far as I know, cublas cannot set a fixed tile size. Please correct me if i am wrong. I was also wondering how to set a fixed tile size in this experiment (Figure 7)?