Newbie - How can I execute the manually modified PTX file?

Now, I just know the compile options, -ptx and -cubin.
I want to modify and execute the PTX file compiled from CU file to optimize performance.
Please tell me how to execute this PTX file.

Use ptxas to compile ptx to cubin, or use the driver api to load the ptx directly (new functionality in 2.1).

In general, i strongly suggest you don’t bother with ptx files. They’re not real machine code (see: decuda). You won’t get much performance out of hand-tweaking (and even a 2x speedup is not a lot of performance!). The biggest performance gains, those that give you 10-100x speedups, have to do with the memory requests not ptx instructions. Writing kernels in assembly impedes you from optimizing them further.

In short, don’t bother with PTX. If you have doubts, check by decompiling with decuda but code only in C. (There are also some instrincs available that map to PTX instructions.)

Thanks for your help. Maybe, I have to see decuda, also install v2.1.

Enable “-v” option of NVCC while compiling. It will list all the command lines step by step.

Now, copy and paste those portion of compilation that are done after PTX.

Just re-run them…

I think this worked for me long ago…