Debugging info in PTX

I’m trying to debug a small program and running into an error when trying to add debugging information to the PTX file. Here’s what my setup looks like:

    [*] x86_64 Linux host

    [*] cuda 3.2 SDK

    [*] GPU in use is GeForce GTX 260 (secondary card and X is not running)

    [*] C program using driver interface to load a PTX file

    [*] PTX file generate from a single .cu file which defines one global function (and does not call any other functions)

    [*] Running the device code is an optional part of the program, the rest of the program runs without issue.

When I compile the .cu file with this:

/usr/local/cuda/bin/nvcc -ptx -arch=sm_11 -v kernel.cu

and run with cuda-memcheck, the program runs to completion and I get an illegal access (this isn’t the question I’m presenting, just background in case it is helpful).

When I add the ‘-G’ flag to the above and attempt to run again I get an error about the program not terminating successfully. Running in cuda-gdb shows the error is ultimately due to an inability to load the PTX file (returns CUDA_ERROR_NO_BINARY_FOR_GPU).

Am I doing something incorrectly, and does anyone have a good reference for debugging PTX files as the cuda-gdb docs seem to only mention it being possible without actually describing how to break in the device code.

Thanks,

–Joe

EDIT - added GPU model and information about X usage

I am not entirely sure, but I believe specifying -arch=sm_11 will result in a fat binary that only contains machine code for sm_11, whereas you want PTX as well. Does replacing -arch=sm_11 with the following result in any improvement:

-gencode arch=compute_11,"code=sm_11,compute_11"

This should ensure that the fat binary contains both machine code and matching PTX.