C vs PTX


I’m having few questions. Anyone Kindly clarify my doubts.

  1. Can we implement the C code with PTX instruction set. If yes, how much performance improvement can we expect in terms of MCPS or other metric?
  2. Is PTX instruction set designed only for NVIDIA GPU drivers? or it will work on other GPU drivers like AMD GPUs? If no, why it won’t support?
  3. Are there any other methods like AVX2 to improve the performance? If any, please share the material links.
  4. Code written in one GPU driver version will work on other versions?

Please refer to the PTX documentation. It’s important to realize it is a language, a “virtual” machine code, not an actual machine code.

Yes, its possible with various restrictions and limitations. That is actually one step of what the CUDA C++ compiler (called nvcc) does. It can convert code written in CUDA C++ to PTX (among other steps, usually).

There is no reason to assume that writing code in one language or another automatically gives performance improvement. In general I would say there is no reason to expect any performance improvement at all if you wrote your GPU code in PTX as opposed to CUDA C++. It’s certainly possible that there might be some performance improvement in some cases, it would be very much code specific, and working in an area where the nvcc compiler was not already fully capable.

It was designed by NVIDIA and the principal target is NVIDIA GPUs. However it is intended to be a virtual language, so I personally don’t know of any specific reasons it could not be used for other purposes or targets, speaking theoretically.

I’m not aware of PTX being usable on AMD GPUs. Since PTX is a language, not a particular machine code, there is a tool (called ptxas) that converts this language to machine code that will run on NVIDIA GPUs. This tool (ptxas) is effectively an optimizing compiler. The output of that tool is SASS code which is NVIDIA GPU machine code. In order for PTX to be usable on another architecture, let’s say AMD GPU, it would be necessary for someone to create an equivalent tool, something like an optimizing compiler, that converts PTX to whatever is the machine code that runs on an AMD GPU. Such a tool might exist, however I am not aware of one.

This strikes me as a very broad question. There are all sorts of optimization methods for making CUDA code run faster on NVIDIA GPUs. I’m not going to cover them all here.

If you’re serious about these topics, you may wish to learn more about CUDA programming. Here is one such public resource which provides an orderly, introductory approach to CUDA. It includes material on optimizing CUDA codes in a number of the sections.

CUDA has a variety of compatibility mechanisms. Code written to target a particular CUDA version will run on that architecture/version and any future versions, with various limitations, assuming proper steps are taken during compilation/code generation.

1 Like

Can you just specify the method names to run faster on NVIDIA GPUs?

This document is a good summary of many practices and techniques that allow you to maximize CUDA application performance and improve reliability.

CUDA C Best Practices Guide

1 Like