Ampere_sgemm_128x128_nn

x-wang20 · March 16, 2022, 1:52pm

when I profiled my cuda program using nsight systems, I always found ampere_sgemm_128x128_nn in the nsys window. I was confused that how my kernel was executed in cuda level. Was it decomposed into several kernels such as ampere_sgemm_128x128_nn ? BTW, where could i find some references about these kernels

Robert_Crovella · March 16, 2022, 1:56pm

It’s probably coming from a cublas call, or a library that uses cublas, like cudnn. You won’t find these kernels documented anywhere.

Topic		Replies	Views
What is the meaning of Operations in Nsight Systems? Profiling Linux Targets	5	1021	November 2, 2022
Using gcgemm from CuBLAS CUDA Programming and Performance	1	717	March 23, 2020
cublas sgemm benchmarks CUDA Programming and Performance	1	3652	July 9, 2008
What does it mean that the grid size in the z dimension is more than one in cuBlas gemms? Nsight Compute	2	374	July 13, 2023
CUBLAS matrix multiplication for NT CUDA Programming and Performance	0	486	June 12, 2020
cubin reading CUDA Programming and Performance	1	635	July 13, 2015
Query regarding launch_block_size and launch_thread_count reported by Nsight Compute for CUDA kernel Nsight Compute	3	817	March 31, 2023
Name explain about stages_64x3 GPU-Accelerated Libraries	2	145	July 20, 2024
Kernel-level cuBLAS GPU-Accelerated Libraries cublas	3	613	October 12, 2021
lower limit of cuBLASSgemm GPU-Accelerated Libraries	2	509	July 15, 2016

Ampere_sgemm_128x128_nn

Related topics