inline-PTX question: Anyone could tell me the meaning of this?

Hi! I am learning PTX recently and meet this, it is for SGEMM, and I am puzzled by some grammar inside…

__device__ __forceinline__ void ldg32_nc_0(float &reg, const void *ptr) {
	asm volatile("{.reg .pred p;\n"
		"mov.b32 %0, 0;\n"
#if __CUDACC_VER_MAJOR__ >= 11 && __CUDACC_VER_MINOR__ >= 4 &&                 \
    __CUDA_ARCH__ >= 750
		"ld.global.nc.L2::128B.f32 %0, [%1];}\n"
#else
		"ld.global.nc.f32 %0, [%1];}\n"
#endif
		: "=f"(reg)
		: "l"(ptr));
}

__device__ __forceinline__ uint32_t smem_u32addr(const void *smem_ptr) {
	uint32_t addr;
	asm("{.reg .u64 u64addr;\n"
		" cvta.to.shared.u64 u64addr, %1;\n"
		" cvt.u32.u64 %0, u64addr;}\n"
		: "=r"(addr)
		: "l"(smem_ptr));

	return addr;
}

I have not learnt assembly before…My questions are:

  1. What is the “{” here? And why it ends several lines later? like: “{.reg .pred p;\n”
  2. What does this line mean? -----“ld.global.nc.L2::128B.f32 %0, [%1];}\n”
  3. What is the u64addr here? (I searched it on google, no result…) ------“{.reg .u64 u64addr;\n”

Thank you very much!!!

Please use code markup when copy-pasting text representing code. Either:
(a) Precede and follow pasted code with three backticks (```) on separate lines.
(b) Select pasted code, then click the </> button for pre-formatted text above the post input field.
The first method is preferred as it will also turn on syntax high-lighting.

(1) The curly braces work just like in C++. They establish a new block scope. You would want to use it any time new variables are declared to avoid name conflicts with variables declared elsewhere, in particular variables declared in other asm blocks.

(2) All PTX instruction should be explained in the PTX manual. In this case see

9.7.8.9. Data Movement and Conversion Instructions: ld.global.nc

Unless I overlooked something, the specifics of L2::128B don’t seem to be explained there, you may want to file an RFE requesting clarification of the documentation. However, it seems quite clear that this is a cache_level::prefetch_size, per the previous section:

9.7.8.8. Data Movement and Conversion Instructions: ld

(3) u64addr is the name of an unsigned 64-bit integer placed in registers (note preceding .reg .u64). In this code, it is assigned from formal argument %1, which is bound to smem_ptr. The name is freely chosen by the programmer. They could have used any other name, like foo, bar, or baz.

1 Like

Thank you very much for your detailed answer!!! I also learnt how to use syntax high-lighting. One further question might be…how to file an RFE requesting clarification? Where and how? Thank you!!!

also for the { } here, maybe you mean,

__device__ __forceinline__ void ldg32_nc_0(float &reg, const void *ptr) {
	asm volatile("{.reg .pred p;\n"
		"mov.b32 %0, 0;\n"
#if __CUDACC_VER_MAJOR__ >= 11 && __CUDACC_VER_MINOR__ >= 4 &&                 \
    __CUDA_ARCH__ >= 750
		"ld.global.nc.L2::128B.f32 %0, [%1];}\n"
#else
		"ld.global.nc.f32 %0, [%1];}\n"
#endif
		: "=f"(reg)
		: "l"(ptr));
}

here we create a register variable p, and we want to make it a local variable only visible in this asm function?

One question might be, where to end this }? because in this example, you see, we even do not use this p variable, and do we really need this { }? Also, suppose we use it, can we end this } at the last time we use p? Like:

__device__ __forceinline__ void ldg32_nc_0(float &reg, const void *ptr) {
	asm volatile("{.reg .pred p;\n"
		"mov.b32 p;}\n"   <-----------------here!!!! see here!!!!! Can we end } here???
#if __CUDACC_VER_MAJOR__ >= 11 && __CUDACC_VER_MINOR__ >= 4 &&                 \
    __CUDA_ARCH__ >= 750
		"ld.global.nc.L2::128B.f32 %0, [%1];}\n"
#else
		"ld.global.nc.f32 %0, [%1];\n"
#endif
		: "=f"(reg)
		: "l"(ptr));
}

Or we end this } later, in the last line of this function? Like:

__device__ __forceinline__ void ldg32_nc_0(float &reg, const void *ptr) {
	asm volatile("{.reg .pred p;\n"
		"mov.b32 %0, 0;\n"
#if __CUDACC_VER_MAJOR__ >= 11 && __CUDACC_VER_MINOR__ >= 4 &&                 \
    __CUDA_ARCH__ >= 750
		"ld.global.nc.L2::128B.f32 %0, [%1];\n"
#else
		"ld.global.nc.f32 %0, [%1];\n"
#endif
		: "=f"(reg)
		: "l"(ptr));} <<<-------------------------here!!!!! see here!!!! End it in the last line!!!
}

Thank you!!!

The normal usage pattern is what you see in the examples posted in this thread. If an asm block defines variables the entire assembly code sequences is enclosed in curly braces.

The fact that one of the examples defines a predicate without ever using it would appear to be a simple artifact of the development process. Presumably p was used at one point, and when the code changed later and it was not needed any more there was incomplete cleanup.

Personally, I have not needed anything besides this pattern (and generally I advise against programming at PTX level unless some GPU functionality is only available in this fashion), but you could always experiment with other approaches to see how it plays out. For example two blocks in curly braces within the same asm statement, or nested blocks in curly braces.

Generally speaking I advocate learning by experiment accompanied by frequent reference to documentation. This is how programmers learned before the internet was a widely available resource, and they learned quite thoroughly in this fashion.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.