What's local memory window?

I found tip of LDL instruction in nsight compute is “Load Within Local Memory Window”.
What’s this window?
I didn’t find any documentation about this.

Local memory is a small per-thread carve-out from global memory. I have never heard that being referred to as a window, but it is not an unreasonable characterization. The physical underpinning is the on-board memory of the GPU either way, but local memory is a per-thread mapping of a small portion of it.

There are some details in the PTX ISA here.

I know what local memory is and what window mean in most cases. But it’s my first time to hear “Local memory window”.
I thought maybe local memory is split into some windows and cuda thread must access local memory in some special way.
Well, maybe I think too much.

I think window comes from a view of things considering the entire (“generic”) address space. The GPU has a unified address space architecture, meaning there is one 64-bit address space that all addressable entities “live” in (as contrasted with for example a harvard architecture device/system).

The GPU does have a partitioned logical address space, meaning “global”, “local”, “shared” and perhaps other logical spaces are distinct. This distinct-ness includes the idea that (for example) the logical local space will have a particular region in the 64-bit address space that all logical local entities will be located in. This isn’t just a random fact but is an important part of how the GPU resolves what to do with a particular “generic” 64-bit address (as well as, on the flip side of the coin, what to do when a 32-bit address is specified).

In this context, the “window” represents the range of generic 64-bit addresses for which the GPU will interpret those addresses/entities as belonging to the logical local space.

Viewed in the generic 64-bit address space, therefore, the logical space is contained within a “window” of addresses - a particular contiguous range.

When applied to an instruction that inherently has a particular logical space in view - for example LDL, it means in some cases the address can be specified (via instruction operand) as an offset from the start of this window - rather than a full 64-bit “generic” address.

For programmers who navigate between CUDA C++ and PTX, this can sometimes create extra wrinkles to handle.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.