Second this my question:
So I use the CUDA 4.0 arch. with a Compute_Capability 2.0 device (GTX460).
What are the differents between the cubin and the ptx file?
I think the cubin is a native code for the gpu so this is arch. specific, and the ptx is an intermediate language that run on Fermi devices (e.g. Geforce GTX 460) via JIT compilation. When I compiling a cu source, I can choose between the ptx or cubin target. If I want the cubin file I choose the “code=sm_20” but if I want a ptx file I use the “code=compute_20”.
Is it correct?
There’s no difference between choosing sm_xx and compute_xx – for example, sm_20 = compute_20.
You’re right that PTX is an intermediate language; you can specify the minimum architecture when emitting PTX, so you can take advantage of instructions on newer hardware.
Think of it like this: a cubin is architecture-specific and NOT forward-compatible – so if you had a cubin targeting the pre-Fermi GT200 architecture, you wouldn’t be able to run it on your Fermi card. However, PTX is forward-compatible, so if you’d created a PTX file targeting sm_13, you’d still be able to run it on your Fermi card (though it won’t be using any of the new PTX instructions introduced for Fermi (sm_20)).