CUDA NVCC creates .target 5.0

MartyMcFly · January 11, 2017, 9:54am

Hello,

we have switched to CUDA 8.0 and create our kernels using NVCC with the command line:

-arch sm_20 -ptx

We store the rrsulting PTX values are local resources and load them on demand from the file system, using the Driver API, with cuLoadModuleEx. The compute capability is set to lowest so that we can support as many systems as possible.

Before CUDA 8 everything worked fine, but now compiling code with the commandline above results in “binary not for this GPU” exception when loading the PTX code using the jitter for a K20.

I was always thinking that PTX code is backwards compatible when using the -arch flag, but since also the ISA version is burnt into the PTX code and reading the documentation this lets me rethink if this is really true.

Thanks
Martin

Robert_Crovella · January 11, 2017, 3:08pm

Probably you should take a look at the cuda vectorAddDrv sample code. When I build and run that code as-is, it creates a ptx kernel file with the following header:

//
// Generated by NVIDIA NVVM Compiler
//
// Compiler Build ID: CL-21124049
// Cuda compilation tools, release 8.0, V8.0.44
// Based on LLVM 3.4svn
//

.version 5.0
.target sm_20
.address_size 64

        // .globl       VecAdd_kernel

.visible .entry VecAdd_kernel(
        .param .u64 VecAdd_kernel_param_0,
        .param .u64 VecAdd_kernel_param_1,
        .param .u64 VecAdd_kernel_param_2,
        .param .u32 VecAdd_kernel_param_3
)
{
        .reg .pred      %p<2>;
        .reg .f32       %f<4>;
        ...

You’ll note that it is PTX ISA version 5.0 and the only target in that ptx file is sm_20

If you study the Makefile you’ll see that this is expected (only sm_20 is specified for GENCODE_FLAGS)

That sample code runs just fine on a K20 (I just tested it now on CUDA 8).

So I suspect there may be something else you are doing that is broken. You might want to study that example to see what is different.

MartyMcFly · January 11, 2017, 3:15pm

Hi txbob,

indeed, we had a look at the generated PTX code. The problem is that we have customers that cannot update to CUDA 8, but our build environment is setup with CUDA 8 since otherwise we cannot use VS2015. In the end CUDA 6.5 would be sufficient for us, but have to go to VS2015.

Since nVidia states that PTX is backwards compatible it seems that the PTX code is not really backwards compatible when you are not using a driver for the same PTX ISA version. Here it would be nice to have something where I can also set the ISA version when creating the PTX code, but obviously this is hardcoded in the PTX “compiler”.

Robert_Crovella · January 11, 2017, 3:45pm

There is essentially no backward compatibility between CUDA versions.

Codes built under CUDA 7.5, whether we are talking about driver API or runtime API, should work with a CUDA 8 installation (i.e. CUDA 8 compatible driver. I am assuming static linking for the runtime API case). This is “forward compatibility”.

But the reverse is generally not the case, and never has been AFAIK. A CUDA 8 code built against a CUDA 8 driver API target will require a CUDA 8 driver to run correctly. Even if we leave the PTX and JIT out of the picture, the compiled host code that makes calls into the CUDA driver API library, linked against CUDA 8 driver API library, would not be expected to work with a CUDA 7.5 driver.

Stated another way, pre-CUDA 8 drivers know nothing about PTX ISA 5.0, which was introduced with CUDA 8. Therefore they are unable to jit that code.

The backward compatibility mentioned in the CUDA PTX ISA docs generally doesn’t refer to this kind of CUDA version backward compatibility, in my view.

Since CUDA 8 compiled driver API code would not work without a CUDA 8 driver, I can only assume that the case you are talking about is dropping a CUDA 8 compiled ptx code into an otherwise CUDA 7.5 compiled code (i.e. CUDA 7.5 driver API calls). I wouldn’t expect that to work, in general.

Being able to set the PTX ISA target for ptx compilation might be something you would consider filing an enhancement request for - just a bug with RFE in the description.

njuffa · January 12, 2017, 10:30pm

Unless I am misunderstanding something about the use case, there is no need for customers to “update to CUDA 8”. All they need to do is update to the latest NVIDIA drivers. Any recent driver package comes with the latest JIT compiler which allows PTX 5.0 to be JIT compiled to machine code for any GPU with compute capability >= 2.0.

My general recommendation would be to deliver fat binaries to customers, with PTX only for the latest architecture (for forward compatibility) and use JIT compilation only if dynamic code generation is a requirement, based on user input.

Topic		Replies	Views
Running PTX Code from CUDA 4.0 in CUDA 4.1 or CUDA 4.2 CUDA Programming and Performance	5	2471	May 30, 2012
Ptxas error while migrating from OptiX 6.0 to 7.2 OptiX	7	1969	October 12, 2021
Determining correct compute capability for a loaded PTX file/kernel ? CUDA Programming and Performance	10	2603	February 11, 2015
When will cuda 8 be released? 1080 can't run with 7.5 CUDA Programming and Performance	13	5054	May 27, 2016
Going to learn PTX and write a GPU compiler CUDA Programming and Performance	20	26820	January 19, 2009
Lower Level CUDA NVasc CUDA Programming and Performance	20	17992	July 10, 2007
Can no longer create backward compatible CUDA binary with Titan V and CUDA 9 CUDA Setup and Installation	4	1040	August 2, 2018
Is this a bug of NVCC 5.5 on code generation/optimization? CUDA Programming and Performance	4	806	April 25, 2014
Build for CUDA Capabilities unsupported by current system? CUDA Programming and Performance	6	1917	June 10, 2015
Ptx CUDA Driver API "Hello world" in VS 2010 CUDA Programming and Performance	5	1987	September 12, 2013

CUDA NVCC creates .target 5.0

Related topics