Cubin assembler is now available decuda 0.4.0 released

No…i m not being pissed of by syntax of it…i m pissed off becoz it does not tell u the actual picture…i want to see where r my rigisters being used. I want to optimize my code :D

if you know…can u tell me how to run decuda…sarnath

Thats awesome wumpus…indeed it will be a nice-to-have tool chain…
carry on the good work…man… External Image

Cheers!!!
Sandeep

I also dont know. I have never used it. But you can always download it and go through the documentation.

IMHO,

If your algorithm is good enough – one should stay away from optimizations at this level – unles n until therez a driving need for this.

I still remember the quote from an author of a book (zen of graphics programming???) - “The best optimizer is in between your ears”.

ok …i have found the way with help from wumpus…

Wumpus, I would like to thank you for your impressive work, it is really a big help for people like me that would take the control of the code (doing a chess-engine) and all the hardware features inside the nVidia GPU.

Great great work!

Hello, very nice tool!

Made me a happier man to see how registers are actually allocated!

Was curious if you have seen any signs of the interpolators that are supposed to exist in the SFU (Special Functions Unit), they are apparently there to help pixel shaders interpolate vertex attributes and as far as I can make out are not exposed in the CUDA APIs.

Perhaps accessing them through the assembler would be possible? Unfortunately no more insights to offer as to how they might be invoked, but if I didn’t missread I gathered that you had traced the shader bins being sent down to the HW?

cheers
.ola

P.S.
IEEE Micro: NVIDIA TESLA: A UNIFIED GRAPHICS AND COMPUTING ARCHITECTURE

Yes, I have traced various kinds of shaders in the HW. Vertex shaders have some special instructions to write to varyings (output registers), which are interpolated for the fragment shaders. Also, there are instructions to read incoming values, used in fragment programs.

As far as I know you cannot use this in cuda, because the values are already interpolated at the start of the shader. You have no programmatic control over them in your kernel. Also using the instructions causes hardware exceptions (Xid stuff …).

Edit: thanks very much to the people that sent me the paper, it is truly interesting

Hi wumpus,

decuda and cudasm are extremely helpful while trying to optimize code. Thanks a lot.

I encountered a problem though. Attaching herewith a cubin for matrix multiply. If I disassemble it with decuda and then assemble it again with cudasm the program gives incorrect output. Whereas if I run the original cubin as is, it works fine. May be this is a bug in cudasm? Could you please take a look? Thanks!

There are a lot of bugs in cudasm, it was more of a proof of concept, I didn’t get around to making it fool proof.

cudasm doesn’t have much use because mucking about with PTX kernels (especially machine-code kernels) is a fool’s errand. doing this takes far more time to code and effort to maintain than is the benefit, and will break as soon as nvidia makes cool changes to its arch. (the only thing it might make sense for is matrix-multiply, since it’s simple and the focus of competitions.) at least, it’s really useless until non-inline functions are supported (so you could link normal code to asm-optimized inner loops). even then, though, it’s dangerous for you and society at large.

decuda, though, is pure gold.

Ok, finally got decuda working and it’s great!

I tried latest stable python from the python official size, and from ActiveState and both gave me problems with modules cStringIO and StringIO. After a lot of digging, it turns out these were standard modules but have been removed from python 3.0. So if you’re going to get python for using decuda, get python 2.6.1!

I’m on Windows Vista.

An off-topic but general FYI: Python 3.0 is a backward-incompatible release designed to fix some long-standing design “bugs” in Python. Unless you are using code which explicitly says “Written for Python 3” or you want to experiment with the future of the language, you should stay away from Python 3. (Not that it is bad, it is just a “forward looking” release.) There are essentially zero useful programs that automatically work with both Python 2 and 3.

It would be nice if people who packaged Python made this more clear. (Tell your friends!)

Awesome tool!