Debugging info in PTX

I’m trying to debug a small program and running into an error when trying to add debugging information to the PTX file. Here’s what my setup looks like:

    x86_64 Linux host

    cuda 3.2 SDK

    GPU in use is GeForce GTX 260 (secondary card and X is not running)

    C program using driver interface to load a PTX file

    PTX file generate from a single .cu file which defines one global function (and does not call any other functions)

    Running the device code is an optional part of the program, the rest of the program runs without issue.

When I compile the .cu file with this:

/usr/local/cuda/bin/nvcc -ptx -arch=sm_11 -v

and run with cuda-memcheck, the program runs to completion and I get an illegal access (this isn’t the question I’m presenting, just background in case it is helpful).

When I add the ‘-G’ flag to the above and attempt to run again I get an error about the program not terminating successfully. Running in cuda-gdb shows the error is ultimately due to an inability to load the PTX file (returns CUDA_ERROR_NO_BINARY_FOR_GPU).

Am I doing something incorrectly, and does anyone have a good reference for debugging PTX files as the cuda-gdb docs seem to only mention it being possible without actually describing how to break in the device code.



EDIT - added GPU model and information about X usage

I am not entirely sure, but I believe specifying -arch=sm_11 will result in a fat binary that only contains machine code for sm_11, whereas you want PTX as well. Does replacing -arch=sm_11 with the following result in any improvement:

-gencode arch=compute_11,“code=sm_11,compute_11”

This should ensure that the fat binary contains both machine code and matching PTX.