well, i’m still not sure how and where microcode are translated from the ptx virtual assembler instructions to microcode. is this done by the ptxas or by the hardware ?
Section 2.1 of the PTX_ISA_1.0.pdf docu says :
“…a function is compiled to the PTX
instruction set and the resulting kernel is translated at install time to the target GPU
So i understood that this is done by the hardware during the first global kernel call. So the cubin contains only ptx instruction, no microcode.
Please correct me if i’m wrong.
Because the microcode is not documented and can change dramaticaly from one
GXX generation to other i think it is very hard to try to understand it. The PTX seems
to be more stable. But it does not describe HOW things are executed on the hardware
but WHAT is executed. It can be understood as a IR (intermediate representation) of a language, right now C.
The reason of the ptx virtual emulator is from my point of view :
- validation of new languages which produces PTX code. (e.g by using llvm.org)
- debugging on ptx level
- memory access analysis (global access patterns, shared memory, etc, etc)
The current debug emulator mode is usefull for validating the functionality of the algorithms and kernels itself (the output, flow, etc) but do not allow any deeper analysis.