Detect highest supported PTX version

wenzel.jakob · November 21, 2020, 1:12am

Dear all,

I am developing a project that JIT-compiles CUDA kernels using NVIDIA’s PTX intermediate representation, using the lower-level driver API. An PTX fragment normally begins with a declaration of the PTX version and target compute capability.

.version 6.3
.target sm_75

The compute capability of the current device is easily detected, e.g. via cuDeviceGetAttribute() and CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MAJOR, etc.

However, the PTX version has proven to be somewhat of an annoyance – this number must be increased based on the “.target”. For example sm_86 requires .version 7.1, etc.

My question: Is there any way that I can programmatically detect the highest supported PTX version of the loaded graphics driver? A similar attribute like CU_DEVICE_ATTRIBUTE_PTX_VERSION_MAJOR sadly does not exist.

My project targets a truly minimal set of PTX instructions that has remained unchanged for years, so I would just like it to run by default on a new GPU/CUDA version without having to maintain and update a Compute Capability->PTX version mapping myself.

Thanks!

Robert_Crovella · November 21, 2020, 1:45am

I’m not sure I believe all your claims.

If you could not take older PTX and run it on newer devices the CUDA compatibility model would have a big hole in it.

sm_86 requires version 7.1 if you want to target cc8.6 specific features. Otherwise the driver is perfectly capable of forward JIT-compiling older PTX to run on a newer device.

To prove this to myself, I happened to have an installed copy of CUDA 6.0 lying around. I took the vectorAdd_kernel64.ptx file from the vectorAddDrv CUDA sample project, and copied it to a directory with the vectorAddDrv binary built on CUDA 10.2, running on a Tesla V100. There was no problem loading the older CUDA 6.0 PTX (version 1.4) using the vectorAddDrv executable, and it compiled and ran correctly on Tesla V100. For reference the CUDA 10.2 vectorAddDrv project has a vectorAdd_kernel64.ptx file with PTX version 6.5 in it.

In addition, for a particular install, the highest supported PTX version is determined by the driver installed, and not anything else. You can have a ptx file with an arbitrarily high PTX version (up to the maximum version supported by the GPU driver), and it will still compile and run correctly as long as the target sm is less than or equal to the device you intend to run it on.

wenzel.jakob · November 21, 2020, 9:57am

Dear Robert,

it seems like I misunderstood something central then. My assumption was that the PTX .target should ideally match the compute capability of the actual hardware (which then causes a dependency on the PTX .version mentioned above).

Based on your response, it seems that I can just target the lowest compute capability that my PTX code actually uses (very low, probably sm_20 or sm_30), and I will still benefit from improvements in the PTX->SASS transformations for newer architectures? Could you confirm if this understanding is correct?

Thanks,
Wenzel

wenzel.jakob · November 21, 2020, 9:58am

Or perhaps a more direct way of asking: if I have some PTX with .target sm_20 and I change it to .target sm_86, will that change the generated SASS code?

njuffa · November 21, 2020, 11:11am

I am assuming that your use case requires you to generate PTX code on the fly (e.g. for a user-provided function) which is then JIT compiled to machine code. If so, I would claim the most appropriate approach is to:

(1) Specify the latest available PTX version
(2) Query the compute capability of the desired GPU and specify that as the target architecture.

Robert_Crovella · November 21, 2020, 3:12pm

I certainly can’t provide guarantees of compiler behavior, especially not future behavior. The recommendation given by njuffa seems reasonable to me, except it puts you back into the mode of asking how to get max supported PTX version, and I don’t know how to do that without your own lookup table, and even that doesn’t address the future case. Given this, I see maybe two possibilities:

Build your own lookup table, of driver version range and supported PTX version, and acknowledge that this still doesn’t cover the future
Given that you are using “stable” PTX, choose a high enough PTX version to make future compilation changes less likely, and use that as your default choice.

I will say this: if I have a PTX program, and the PTX is compatible, say, with both PTX version 5 and also PTX version 6, I would be very surprised if there was any difference in the generated SASS when generated by a driver that understands both PTX 5 and PTX 6. I make this statement independent of the target. However, it’s just a statement of my opinion, not a guarantee. I can’t imagine why a driver that knows how to do something with PTX that it understands, do something differently because it was marked version 5 instead of version 6. I cannot imagine why that would make a difference for SASS generation. But I don’t know everything. Clearly, if you are going to target a particular device compute capability, however, you must specify at least a PTX version that recognizes that compute capability. That is what I mean by “independent”, here. If the target compute capability is fixed, and supported, I don’t know why varying PTX version should matter for SASS generation.

You could also file a bug to request a new feature (report GPU driver max supported PTX version). Just to be clear though, that is unlikely to show up as something like CU_DEVICE_ATTRIBUTE_PTX_VERSION_MAJOR because its not a device attribute.

(with respect to the advice given by njuffa, the only change I would suggest is for item 1 Specify the latest available PTX version that is supported by the currently loaded GPU driver. The driver has a specific error code it will return if you pass it PTX that is of a version it doesn’t recognize)

As I’m typing this, an alternate, kind of strange approach occurs to me. You could use NVRTC to convert a simple test case to PTX (nvrtcGetPTX()), then use the generated PTX to inform yourself of the current PTX version that the driver understands.

Robert_Crovella · November 21, 2020, 3:32pm

I’m going to go out on a limb here. If your PTX requires compute capability X, then I think it should be reasonable to specify that as a target. That also means you can essentially pick the PTX version Y, once.

I don’t know why the compiler (i.e. the driver), given that fixed PTX code, would do something different, when targetting a device of compute capability Z, if the PTX happened to be labelled target X or target Z, or PTX version Y or PTX version W. (Z >= X, W >= Y, W version supports target Z, W,X,Y,Z are all recognized by the currently loaded driver, yada yada yada)

I can’t imagine why any of that would make a difference in SASS code generation. But this is not a guarantee, and if you are looking for a “can’t do better than that” kind of assurance, I refer you to previous statements in this thread.

njuffa · November 21, 2020, 8:16pm

Mostly agreed. To me picking PTX version is like picking a language C++ standard, e.g. C++98, C++11, C++17. I pick one when I write code, and there is the risk that code written against a newer standard will not compile on an older compiler that only supports older standards. But I agree that the risk is arguably higher when picking a PTX version.

I haven’t tried it, but that seems like a workable approach by thought experiment.

wenzel.jakob · November 21, 2020, 10:46pm

Robert and Norbert,
thank you for the helpful discussion. I will stick with this advice and simply target lower compute capability and PTX version as needed by the actual instructions. I agree that it would be illogical for the SASS to differ.

Best,
Wenzel

PS: Completely unrelated – Nobert, thanks for your inspiring posts about faithfully rounded minimax polynomial approximations here and on StackOverflow. I learned a lot from them!