.loc in PTX code

I compiled my CUDA code into PTX and I got the following code section:

.local .align 4 .b8 	__local_depot0[8];
	.reg .b64 	%SP;
	.reg .b64 	%SPL;
	.reg .pred 	%p<20>;
	.reg .b32 	%r<42>;
	.reg .f64 	%fd<99>;
	.reg .b64 	%rd<23>;
	.loc	1 37 0

What is the .loc in PTX?

See documentation

1 Like

Thanks. Is there a convenient way to debug code at PTX level?

You can certainly debug an executable created with PTX with cuda-gdb, but I don’t think you can do PTX “source-level” debug. You might want to ask this question on the cuda-gdb forum.

I see. So PTX is only a way to check what resources are used to execute a kernel. It is not for execution debugging.

PTX is both a virtual instruction set architecture (ISA) as well as a compiler intermediate format. Its purpose is to abstract away differences between the constantly changing GPU microarchitectures that are generally not binary compatible. The NVVM part of the CUDA compiler (derived from LLVM) generates PTX code and applies high-level machine-independent optimizations. The PTX code is then compiled by ptxas, another optimizing compiler, into machine code (SASS). ptxas applies some general but mostly machine specific optimizations and is responsible for register allocation and instruction selection and scheduling.

This model of evolving the hardware is markedly different from the long-term binary compatibility maintained in the x86 world. NVIDIA’s approach results in an acceleration of the technical evolution of the GPU hardware compared to x86 CPUs, while at the same time increasing the challenges of software development in the lowest layers of the software stack. Although NVIDIA may still be perceived as a hardware company, it employs more software engineers than hardware engineers per public statements from the company’s representatives, and is actually a fully integrated systems company.

1 Like

Very clear. Thanks! Nvidia is definitely the next company I would like to join.