Driver API: PTX or CUBIN modules?

With the release of SDK 2.2.1 the driver API samples load PTX kernel modules instead of CUBIN ones. From the release notes:

What is the reason behind the change? I would like to know if there are any advantages to using PTX over CUBIN.

We’ve never claimed that cubins are forward compatible, while this is the reason for PTX’s existence. Plus, there are always additional compiler optimizations if we add new instructions to future hardware that we could then insert via JITting from PTX.

Thanks for clarifying this. Does cuModuleLoadData with a PTX string automatically compile for the sm architecture of the runtime GPU? (assuming no optional flags in the nvcc command that generated the PTX)

Speaking of the driver API and cuModuleLoadData, another thing I wondered is if it will support JIT compiling of kernels written in C (like OpenCL). The “CUDA Architecture Overview” pdf mentions this:

Is that a mistake in the document? I couldn’t find any reference to that in the manuals.

I’m not actually sure about the JITting. I think so, though.

and no, there’s no support of JITting directly from CUDA C, that’s a mistake/incredibly murky description.