CUDA NVCC creates .target 5.0

Hello,

we have switched to CUDA 8.0 and create our kernels using NVCC with the command line:

-arch sm_20 -ptx

We store the rrsulting PTX values are local resources and load them on demand from the file system, using the Driver API, with cuLoadModuleEx. The compute capability is set to lowest so that we can support as many systems as possible.

Before CUDA 8 everything worked fine, but now compiling code with the commandline above results in “binary not for this GPU” exception when loading the PTX code using the jitter for a K20.

I was always thinking that PTX code is backwards compatible when using the -arch flag, but since also the ISA version is burnt into the PTX code and reading the documentation this lets me rethink if this is really true.

Thanks
Martin

Probably you should take a look at the cuda vectorAddDrv sample code. When I build and run that code as-is, it creates a ptx kernel file with the following header:

//
// Generated by NVIDIA NVVM Compiler
//
// Compiler Build ID: CL-21124049
// Cuda compilation tools, release 8.0, V8.0.44
// Based on LLVM 3.4svn
//

.version 5.0
.target sm_20
.address_size 64

        // .globl       VecAdd_kernel

.visible .entry VecAdd_kernel(
        .param .u64 VecAdd_kernel_param_0,
        .param .u64 VecAdd_kernel_param_1,
        .param .u64 VecAdd_kernel_param_2,
        .param .u32 VecAdd_kernel_param_3
)
{
        .reg .pred      %p<2>;
        .reg .f32       %f<4>;
        ...

You’ll note that it is PTX ISA version 5.0 and the only target in that ptx file is sm_20

If you study the Makefile you’ll see that this is expected (only sm_20 is specified for GENCODE_FLAGS)

That sample code runs just fine on a K20 (I just tested it now on CUDA 8).

So I suspect there may be something else you are doing that is broken. You might want to study that example to see what is different.

Hi txbob,

indeed, we had a look at the generated PTX code. The problem is that we have customers that cannot update to CUDA 8, but our build environment is setup with CUDA 8 since otherwise we cannot use VS2015. In the end CUDA 6.5 would be sufficient for us, but have to go to VS2015.

Since nVidia states that PTX is backwards compatible it seems that the PTX code is not really backwards compatible when you are not using a driver for the same PTX ISA version. Here it would be nice to have something where I can also set the ISA version when creating the PTX code, but obviously this is hardcoded in the PTX “compiler”.

There is essentially no backward compatibility between CUDA versions.

Codes built under CUDA 7.5, whether we are talking about driver API or runtime API, should work with a CUDA 8 installation (i.e. CUDA 8 compatible driver. I am assuming static linking for the runtime API case). This is “forward compatibility”.

But the reverse is generally not the case, and never has been AFAIK. A CUDA 8 code built against a CUDA 8 driver API target will require a CUDA 8 driver to run correctly. Even if we leave the PTX and JIT out of the picture, the compiled host code that makes calls into the CUDA driver API library, linked against CUDA 8 driver API library, would not be expected to work with a CUDA 7.5 driver.

Stated another way, pre-CUDA 8 drivers know nothing about PTX ISA 5.0, which was introduced with CUDA 8. Therefore they are unable to jit that code.

The backward compatibility mentioned in the CUDA PTX ISA docs generally doesn’t refer to this kind of CUDA version backward compatibility, in my view.

Since CUDA 8 compiled driver API code would not work without a CUDA 8 driver, I can only assume that the case you are talking about is dropping a CUDA 8 compiled ptx code into an otherwise CUDA 7.5 compiled code (i.e. CUDA 7.5 driver API calls). I wouldn’t expect that to work, in general.

Being able to set the PTX ISA target for ptx compilation might be something you would consider filing an enhancement request for - just a bug with RFE in the description.

Unless I am misunderstanding something about the use case, there is no need for customers to “update to CUDA 8”. All they need to do is update to the latest NVIDIA drivers. Any recent driver package comes with the latest JIT compiler which allows PTX 5.0 to be JIT compiled to machine code for any GPU with compute capability >= 2.0.

My general recommendation would be to deliver fat binaries to customers, with PTX only for the latest architecture (for forward compatibility) and use JIT compilation only if dynamic code generation is a requirement, based on user input.