PTX code along with C code -LIST:source=on is NOT working

The CUDA FAQ (under CUDA announcement and news) section says PTX code can be viewed side by side with C code by adding “”–opencc-options -LIST:source=on" to NVCC command line.

However the generated PTX file is still only in Assembly. WHen I enable verbose option, I see that “-LIST:source=on” is being passed to nvopencc command line…

I am compiling only a “debug” release.

Can some1 tell me what else should I do to see PTX and C side-by-side?

Thank you,
-----<>><>>…

Never heard of the option, but I am very keen to find out the answer to your question. I quickly get lost where I am in my C-code when looking at the ptx…

Yeah, Me too. It will make our job very simple and will be easy to train people on CUDA too…

This option is mentioned in CUDA FAQ 1.0 from Simon Green. Check out “CUDA annoucnement and news” forum.

I think NVIDIA guys are too busy with some tight schedules… I dont see them answer any of the queries… :thumbsdown:

Well, they have said that a new CUDA beta is likely by the end of the month: maybe they are working on that :)

Regarding the side by side C and PTX, I’m not sure what problem you are having. Maybe you aren’t specifying -keep or -ptx to keep the generated ptx?

test.cu

__global__ void kernel(float *d_out)

    {

    int idx = blockIdx.x * blockDim.x + threadIdx.x;

   float a = idx*0.05f + 4.2f;

    for (int i = 0; i < idx; i++)

        a += 0.01f;

   d_out[idx] = a;

    }

nvcc -keep --opencc-options -LIST:source=on test.cu

test.ptx (abbreviated)

//   1  __global__ void kernel(float *d_out)

$LBB1__Z6kernelPf:

    .loc    12  5   0

 //   2     {

 //   3     int idx = blockIdx.x * blockDim.x + threadIdx.x;

 //   4

 //   5     float a = idx*0.05f + 4.2f;

    mov.u16     $rh1, %ctaid.x;         //

    mov.u16     $rh2, %ntid.x;          //

    mul.wide.u16    $r1, $rh1, $rh2;    //

    cvt.u32.u16     $r2, %tid.x;        //

    add.u32     $r3, $r2, $r1;          //

    mov.f32     $f1, 0f40866666;        //  4.2

    cvt.rn.f32.s32  $f2, $r3;       //

    mov.f32     $f3, 0f3d4ccccd;        //  0.05

    mad.f32     $f4, $f2, $f3, $f1;     //

    mov.s32     $r4, 0;                 //

    setp.le.s32     $p1, $r3, $r4;      //

    @$p1 bra    $Lt_0_5;                //

    mov.s32     $r5, $r3;               //

    mov.s32     $r6, 0;                 //

    mov.s32     $r7, $r5;               //

$Lt_0_7:

 //<loop> Loop body line 5, nesting depth: 1, estimated iterations: unknown

    .loc    12  7   0

 //   6     for (int i = 0; i < idx; i++)

 //   7         a += 0.01f;

    mov.f32     $f5, 0f3c23d70a;        //  0.01

    add.f32     $f4, $f4, $f5;          //

    add.s32     $r6, $r6, 1;            //

    setp.ne.s32     $p2, $r6, $r3;      //

    @$p2 bra    $Lt_0_7;                //

$Lt_0_5:

    .loc    12  9   0

 //   8

 //   9     d_out[idx] = a;

    ld.param.u64    $rd1, [__cudaparm__Z6kernelPf_d_out];   //  id:25 __cudaparm__Z6kernelPf_d_out+0x0

    cvt.u64.s32     $rd2, $r3;          //

    mul.lo.u64  $rd3, $rd2, 4;      //

    add.u64     $rd4, $rd1, $rd3;       //

    st.global.f32   [$rd4+0], $f4;  //  id:26

    exit;                           //

    } // _Z6kernelPf

This is with CUDA 1.1 btw.

Thanks for your reply. I am invoking with -keep option only…But I invoke it from VS2005 environment. The “-v” option shows that the LIST:source on option is being passed to nvopencc… I am realyl not sure what is happening here… :-(

Let me retry it from command line and see if it makes a differnce.

Appreciate your time on this.

And yes, I am 1.1 too… I transitioned long long back…Nice to know another CUDA release is around the corner.

Best Regards,
Sarnath

I’ve only ever tried the list source mode in linux, so that might be another factor.

Aaa. I c. can some1 try it on window and confirm the behaviour I saw.

Atleast, we can hope to fix this by next release…

Best Regards,

Sarnath

works in winXP as well.

nvcc.exe -link -ccbin “C:\Program Files\Microsoft Visual Studio 8\VC\bin” -DWIN32 -D_DEBUG -D_CONSOLE -Xcompiler “/EHsc /W3 /nologo /Wp64 /Od /Zi /MDd /GR” -I"C:\Program Files\NVIDIA Corporation\NVIDIA CUDA SDK\common\inc" -I"C:\CUDA\include" --opencc-options -LIST:source=on -keep kern.cu main.cpp

(not finished) bitonic sort for 1024 elements:

two files included generate over 1MB of listings, PTX is present as well.
kern.txt (1.43 KB)
main.cpp (638 Bytes)

Well, You say PTX is present . But does it have C mixed with PTX i.e. C and PTX side by side as shown in Mr.Anderson’s post ?

Is it the listing you are looking for ?
Did you try my example ?



// Loop body line 25, nesting depth: 1, estimated iterations: unknown
.loc 13 31 0
// 27 // Parallel bitonic sort.
// 28 for (int k = 2; k <= 512; k<<=1) // k *= 2)
// 29 {
// 30 // Bitonic merge:
// 31 for (int j = k / 2; j>0; j>>=1) // j /= 2)
div.s32 $r11, $r10, 2; //
mov.s32 $r12, $r11; //
mov.s32 $r13, 0; //
setp.le.s32 $p1, $r11, $r13; //
@$p1 bra $Lt_0_24; //
$Lt_0_26:
// Loop body line 31
xor.b32 $r14, $r12, $r3; //
setp.le.u32 $p2, $r14, $r3; //
@$p2 bra $Lt_0_27; //
// Part of loop body line 31, head labeled $Lt_0_26
mul.lo.u32 $r15, $r14, 4; //
ld.shared.f32 $f2, [$r6+0]; // id:77 __cuda_shared8+0x0
add.u32 $r16, $r15, $r1; //


Sorry guys. When I removed “#pragma unroll”, C and PTX started coming side by side…

With the “#pragma” present – the C and PTX generation does NOT happen for that FOR loop.

I think its probably understandable…

My mistake guys… Sorry for troubling you all.

Thanks to “.m.” and Mr. Anderson for their time.

Best Regards,

Sarnath

nvcc warning : Option '--opencc-options (-Xopencc)' is obsolete and ignored

What’s the current way to display C along ptx (CUDA 4.1)? Thanks!

Note that the “mixed listing” feature was tied to a component-specific flag. As CUDA 4.1 introduces a new frontend for sm_2x and up, that component has been replaced, and thus the flag is no longer accepted. In general, component-specific flags tend to be unsupported, and I would advise against their use in production builds.

There does not seem to be an equivalent “mixed listing” functionality provided via the new frontend. Sorry for the inconvenience. If you find the “mixed listing” capability useful (I was not aware of this feature and thus do not know what the listing looks like) and would like to see it re-instated, I would suggest filing a feature request through the bug reporting form. This is accessible via the registered developer website at partners.nvidia.com. There isa link in a menu on the left hand side of the screen.

OK, thanks for the quick reply. Well, for me it was useful when comparing ptx output of two versions of the same, but slightly modified source code - it helps me to quickly find what to look at, but probably there are plenty of other ways to achieve that too. it’d be great if somebody more experienced with ptx can give some tips!

Btw, is there maybe some workaround how to display combined C+PTX?

p.s. Are there named barriers in CUDA4.1? I’d insert two barriers A and B around the interesting region of CUDA code and search for them in the matching ptx.

If you just want to use them as markers, you could make your own by inserting inline PTX-“assembler” comments;

asm volatile ("// this is line ...");

thanks tera, I’ll try that out!