I’d like to announce that the most recent version of decuda, my disassembler for .cubin instructions for the G8x/G9x architectures, now includes an assembler. It allows writing and optimizing code specificially for the G8x and G9x series, and completes the independent toolchain for this hardware. It takes a text file with assembly instructions as input, and produces a .cubin file as output.
Not the entire instruction set is supported yet, but it is capable of assembling working CUDA kernels, including predication and flow control constructs.
Also, decuda can export in a format that (should be) reparsable by cudasm, so that it is possible to make changes to the code produced by nvcc, and reassemble.
The software can be found here: http://www.cs.rug.nl/~wladimir/decuda/
The assembler is still in a beta stage, and barely documented. So let me know if you have any questions as to its use, or if you find nasty bugs.