PTX instruction level tracing

I have a C program that uses CUDA driver API to compile and launch a hand generated PTX kernel. Is it possible for me to use cuda-gdb to trace the execution of the kernel at PTX instruction level? If not could anyone please suggest a tool that I can use to do so?

OK. I figured out that it is possible to trace PTX using cuda-gdb 5.5. Thx.