ok, I’ve gotten the hang of the code repository somewhat. Actually, it’s a fairly powerful and simple to use (though poorly documented!) feature.
Basically, it works like this:
You can change kernels without recompiling the executable by simply creating or replacing files in a special directory. If your executable is called “./L33tProg.exe,” the runtime will automatically check for kernel implementations in a folder (or tar file!) called “./L33tProg.devcode”. The kernels can be either in ptx or cubin form.
Creating the ptx/cubin files is a bit difficult, though, because there are some sort of restrictions on the contents of the files, and there’s no error reporting except when your kernel runs fine and gives broken results. To not have to start from scratch, you can add the “-dir=$(ProjectName).exe.devcode -ext=all –int=none –arch compute_10 –code compute_10,sm_10,sm_11” compile flags. These will generate the L33tProg.devcode folder that contains all the kernels in your project. The flags also make it so that the executable requires this folder and doesn’t have embedded kernels itself. That’s a debugging trick so you never have doubts whether the new kernel gets loaded.
However, programming in ptx is turning out especially difficult with no debugging. In fact, you’re not even really informed when the kernel totally fails except that the output data looks a certain way. ptxas doesn’t emit informative syntax errors either, and sometimes just dies with “internal error” with no linenumber. It would be great if ptx could be compiled to host code and debugged just like cu. Ah… I dream of inline ptx. And of a better assembly syntax… maybe something that looks like c but isn’t (so a mov is an assignment, a cvt is a cast, an ALU op is an expression… but you can only do one thing on a line). Hell, ptx ain’t real assembly anyway.
p.s. the proper docs are in chapter 6 of C:\CUDA\doc\NVCC_1.0.pdf