All device code is indeed compiled finally/ultimately for NVIDIA GPUs by the ptxas compiler (or the equivalent functionality in the GPU driver). There are a few user-created assemblers out there (e.g. maxas) but these aren’t that relevant to this discussion, I don’t think.
However, the CUDA device code compilation process doesn’t necessarily begin with ptxas, and the conversion of source code (in whatever form it may be) to PTX may follow a number of available paths, some of which are not wholly created by NVIDIA or part of the NVIDIA provided toolchain(s). I’ll mention 2 examples:
clang has the ability to compile CUDA C++ device code:
gnu tools have the ability compile OpenACC device source code:
As far as I know, both of these examples build fatbinaries with embedded PTX, so they are runnable directly as a “CUDA executable”. The conversion to CUDA machine code would be handled by the GPU driver, equivalently to a CUDA executable built with e.g. -gencode arch=compute_30,code=compute_30 using NVIDIA nvcc toolchain.
I’m not trying to provide any value judgments here, or any statements of suitability.