The join instruction would be really handy if we could just hard code it in ptx…
If an assembler could be worked out, we could have a completely independent tool chain (if I’m allowed to release my front end). But I guess the assembler may need some more work after G92 comes out.
My boss would surely be mad at me if I post THAT ptx:(
Well, never mind, that’s not a big issue, I could just delete it anyway.
It seems to also have problem disassembling pass1 of my scan.
Yes, the next logical step from here is an assembler, so you can at least edit the code and recompile. Or experiment with new combinations of instructions.
What kind of front-end are you working on?
Edit: yes, just send it to my mail, or something that looks like it but has the same problem :)
Just a non-optimizing front end of a C++ like language for CUDA to address certain issues.
I started writing it when fed up with 0.8’s bugs. 1.0 compiler still turns out to be over-optimizing, and I continued it.
it appears dx10 constant buffer = constant memory. Both support 16 segments, although CUDA generally only uses 0 (global constant memory), 1 (kernel local constant memory) and 14 (relocations)
It is a hint to help the hardware handle divergent branches. Placed before a point of divergence, it indicates where the control paths will merge again.
A typical if-then-else block will be compiled to:
# condition in p1
join endif # both path merge at endif
@p1.eq br else # if not p1 goto else
# here paths may diverge
# code when p1 is true
br endif
else:
#code when p1 is false
endif:
nop.join # merge control paths again
As far as I know, Intel isn’t publishing their internal Microcode either - all that is published is the specification of the x86 instruction set and its extensions (SSE, MMX etc).
This is very much in line with what nVidia is doing with their PTX instruction set - while hiding the details about microcode that is in use by current graphics chip generations.