meaning of "subunit of the SM"

hello i have a few questions
i’m using nsight compute in window
In details page(GPU Speed Of Light), describe SQL SM[%] as “Maximum utilization percentage of any subunit of the SM”
what dose “subunit of SM” mean?? is it a thread??

and in details page(Memory Workload Analysis), What does the number of instructions in the “global load cached” in the “first-level cache” mean? and i know TEX as texture memory, is the number of “SM->TEX Request” indicating how many request the SM sends to texture memory??

Thank you for your reply

Subunit does not mean a thread. It refers to the various components that compose the functionality of the SM hardware. Among those are for example the different execution pipelines which you see in the ‘Pipe Utilization’ chart of the ‘Compute Workload Analysis’ section, e.g. ALU (arithmetic logic unit), FP16 (16bit floating point unit), Tensor, TEX, etc. We are planning to make it more clear in future versions of Nsight Compute which subunit is the exact limiter.

As for SM->TEX Requests, this column indicates the number of requests made from the streaming multiprocessor (SM) to the unified first level cache. In the ‘Global Load Cached’ row, you see the respective value caused by load instructions reading from global memory that were cached (in L1/TEX).

Let us know if there are further questions. Thanks.

Now I see. There’s been a lot of help. Thank you so much