Please refer to the PTX documentation. It’s important to realize it is a language, a “virtual” machine code, not an actual machine code.
Yes, its possible with various restrictions and limitations. That is actually one step of what the CUDA C++ compiler (called nvcc
) does. It can convert code written in CUDA C++ to PTX (among other steps, usually).
There is no reason to assume that writing code in one language or another automatically gives performance improvement. In general I would say there is no reason to expect any performance improvement at all if you wrote your GPU code in PTX as opposed to CUDA C++. It’s certainly possible that there might be some performance improvement in some cases, it would be very much code specific, and working in an area where the nvcc
compiler was not already fully capable.
It was designed by NVIDIA and the principal target is NVIDIA GPUs. However it is intended to be a virtual language, so I personally don’t know of any specific reasons it could not be used for other purposes or targets, speaking theoretically.
I’m not aware of PTX being usable on AMD GPUs. Since PTX is a language, not a particular machine code, there is a tool (called ptxas
) that converts this language to machine code that will run on NVIDIA GPUs. This tool (ptxas
) is effectively an optimizing compiler. The output of that tool is SASS code which is NVIDIA GPU machine code. In order for PTX to be usable on another architecture, let’s say AMD GPU, it would be necessary for someone to create an equivalent tool, something like an optimizing compiler, that converts PTX to whatever is the machine code that runs on an AMD GPU. Such a tool might exist, however I am not aware of one.
This strikes me as a very broad question. There are all sorts of optimization methods for making CUDA code run faster on NVIDIA GPUs. I’m not going to cover them all here.
If you’re serious about these topics, you may wish to learn more about CUDA programming. Here is one such public resource which provides an orderly, introductory approach to CUDA. It includes material on optimizing CUDA codes in a number of the sections.
CUDA has a variety of compatibility mechanisms. Code written to target a particular CUDA version will run on that architecture/version and any future versions, with various limitations, assuming proper steps are taken during compilation/code generation.