ptx consistency & documentation

hello guys,

i’m thinking about to develop a chip-level emulator for the PTX virtual machine. I take a look on the PTX doc and the structure and description seems to be not really complicated. In comperation e.g with the old Z80 instructions there are surprisingly few

A also take a look on wumpus decuda code and it seems that the PTX documentation is
not really completed and some things are not yet fully clear so wumpus did a lot if research to figure out the “meaning” of the bits and bytes in the cubin files.

Do you guys have another source of informations or ideas ?


You’re confusing two things, I think:

  • ptx is a virtual assembly language, which is compiled to chip-specific microcode. It is there for convenience, but does not describe any existing hardware.

  • decuda (dis)assasembles G8x and G9x microcode a.k.a. cubin. This is the code that is really executed on the device. The microcode format is not documented by NVidia, so about anything that is known about it is due to my research.

So it is not possible to write a chip-level emulator using ptx, as the chip doesn’t execute ptx. You’d need to have it execute cubin directly. If you have any questions about the instructions let me know, I find this an very interesting project.

I could see a reason for a microcode-level emulator, but what advantages would a PTX level emulator have above the C-level one that is built in?

well, i’m still not sure how and where microcode are translated from the ptx virtual assembler instructions to microcode. is this done by the ptxas or by the hardware ?

Section 2.1 of the PTX_ISA_1.0.pdf docu says :

“…a function is compiled to the PTX
instruction set and the resulting kernel is translated at install time to the target GPU
instruction set”.

So i understood that this is done by the hardware during the first global kernel call. So the cubin contains only ptx instruction, no microcode.
Please correct me if i’m wrong.

Because the microcode is not documented and can change dramaticaly from one
GXX generation to other i think it is very hard to try to understand it. The PTX seems
to be more stable. But it does not describe HOW things are executed on the hardware
but WHAT is executed. It can be understood as a IR (intermediate representation) of a language, right now C.

The reason of the ptx virtual emulator is from my point of view :

  • validation of new languages which produces PTX code. (e.g by using
  • debugging on ptx level
  • memory access analysis (global access patterns, shared memory, etc, etc)

The current debug emulator mode is usefull for validating the functionality of the algorithms and kernels itself (the output, flow, etc) but do not allow any deeper analysis.


It’s done by ptxas. Cubin is directly executed by the hardware, that’s why it has convenient RISC instructions. It’s also the same shader format as used for fragment/vertex/geometry shaders.

And yes, the cubin format could change (even dramatically) between one generation and the next, but I don’t expect this to happen any time soon. At least all G8x and G9x will stick with this format.

PTX is not really suited for emulation as you have a huge amount of registers. Also, it is heavily optimised by ptxas before outputting cubin.

agree, in this case a direct PTX emulator is not really usefull. Cubin emulator, which iterates over the RISC instructions will be cool.

could you please send me some information about the instructions in the cubin file, so
i do not need to debug the decuda python code ? :-)

thanks a lot,


if you have specific questions, I can try to answer them, For some general information you could look at the README file that comes with decuda.