running .ptx on GPU

Hi.

Let say that i have .ptx program, how to run it on GPU, and read results. For example .ptx program multiply two matrices A*B=C i want to read C and load it to host memory.

PTX is just an intermediate language. It still needs to be optimized by the CUDA compiler (or the driver, which I think can do JIT compilation now). You’d have to initialize the driver, copy the matrix data to the device (of course, allocate memory for the result ‘C’), run the PTX program via the driver call (check the programming guide for the exact function), then copy your result back to host memory.

sorry but i dont quite understood. In CUDA “Programming Guide” you can find examples of programs but they use “blackbox” functions in C cudaMalloc,cudaMemcpy, etc. I couldnt find any thing about let say “CUDA call” which i could call from forexample assembler with example parameters “rax=adress of ptx program, rbx=adres wher to copy result” or even C function ex. Run_ptx( .ptx progr.file/ , return values), or more probably i didnt understand what do you mean. It would be much easyer if you would give simple example or link to such example of calling ptx program via driver.

Unfortunately it’s a fairly complex process to run a PTX program using any of the CUDA interfaces. There is no way to call a PTX program explicitly via x86 assembly as in your example because PTX is a virtual instruction set that must be recompiled and loaded onto the gpu before it is executed. The process is more along the lines of using host code to pass PTX plus some launch parameters to the NVIDIA driver via some API calls which will execute the code on the host app’s behalf.

When you run nvcc on a cuda file, all of the kernel calls get converted to some pretty obfuscated C library calls

somekernel<<< ctas, threads, memory >>>(Parameter);

gets converted to something like

cudaConfigureCall( ctas, threads, memory );

cudaSetupArgument( Paramter, … );

cudaLaunch( somekernel );

And this program gets compiled by a native compiler and linked against the cuda runtime library which implements cudaConfigureCall cudaSetupArgument and cudaLaunch.

This would not be so bad if cudaLaunch actually took a PTX string as an argument. At least in everything generated by nvcc it takes a very oddly formatted handle (a function pointer casted to a const char*) to a fat binary which has to have already been registered with the cuda runtime. The PTX source is actually inlined as a static array as part of a fat binary struct which is registered with the runtime when the constructor for main() is called.

Now it is possible to run a PTX file directory using the driver level API. But you have to pass in parameters and configure the call directly using API calls:

see:

cuModuleLoad

cuModuleGetFunction

cuParamSetSize

cuFuncSetBlockShape

cuFuncSetSharedSize

cuParamSetv

cuLaunchGrid

and use nvcc --ptx to generate a text file that can be loaded by cuModuleLoad