PTX code along with C code -LIST:source=on is NOT working

Sarnath · March 20, 2008, 5:13pm

The CUDA FAQ (under CUDA announcement and news) section says PTX code can be viewed side by side with C code by adding “”–opencc-options -LIST:source=on" to NVCC command line.

However the generated PTX file is still only in Assembly. WHen I enable verbose option, I see that “-LIST:source=on” is being passed to nvopencc command line…

I am compiling only a “debug” release.

Can some1 tell me what else should I do to see PTX and C side-by-side?

Thank you,
-----<>><>>…

DenisR · March 20, 2008, 6:37pm

Never heard of the option, but I am very keen to find out the answer to your question. I quickly get lost where I am in my C-code when looking at the ptx…

Sarnath · March 21, 2008, 8:18am

Yeah, Me too. It will make our job very simple and will be easy to train people on CUDA too…

This option is mentioned in CUDA FAQ 1.0 from Simon Green. Check out “CUDA annoucnement and news” forum.

I think NVIDIA guys are too busy with some tight schedules… I dont see them answer any of the queries… External Media

MisterAnderson42 · March 21, 2008, 1:27pm

Well, they have said that a new CUDA beta is likely by the end of the month: maybe they are working on that :)

Regarding the side by side C and PTX, I’m not sure what problem you are having. Maybe you aren’t specifying -keep or -ptx to keep the generated ptx?

test.cu

__global__ void kernel(float *d_out)

    {

    int idx = blockIdx.x * blockDim.x + threadIdx.x;

   float a = idx*0.05f + 4.2f;

    for (int i = 0; i < idx; i++)

        a += 0.01f;

   d_out[idx] = a;

    }

nvcc -keep --opencc-options -LIST:source=on test.cu

test.ptx (abbreviated)

//   1  __global__ void kernel(float *d_out)

$LBB1__Z6kernelPf:

    .loc    12  5   0

 //   2     {

 //   3     int idx = blockIdx.x * blockDim.x + threadIdx.x;

 //   4

 //   5     float a = idx*0.05f + 4.2f;

    mov.u16     $rh1, %ctaid.x;         //

    mov.u16     $rh2, %ntid.x;          //

    mul.wide.u16    $r1, $rh1, $rh2;    //

    cvt.u32.u16     $r2, %tid.x;        //

    add.u32     $r3, $r2, $r1;          //

    mov.f32     $f1, 0f40866666;        //  4.2

    cvt.rn.f32.s32  $f2, $r3;       //

    mov.f32     $f3, 0f3d4ccccd;        //  0.05

    mad.f32     $f4, $f2, $f3, $f1;     //

    mov.s32     $r4, 0;                 //

    setp.le.s32     $p1, $r3, $r4;      //

    @$p1 bra    $Lt_0_5;                //

    mov.s32     $r5, $r3;               //

    mov.s32     $r6, 0;                 //

    mov.s32     $r7, $r5;               //

$Lt_0_7:

 //<loop> Loop body line 5, nesting depth: 1, estimated iterations: unknown

    .loc    12  7   0

 //   6     for (int i = 0; i < idx; i++)

 //   7         a += 0.01f;

    mov.f32     $f5, 0f3c23d70a;        //  0.01

    add.f32     $f4, $f4, $f5;          //

    add.s32     $r6, $r6, 1;            //

    setp.ne.s32     $p2, $r6, $r3;      //

    @$p2 bra    $Lt_0_7;                //

$Lt_0_5:

    .loc    12  9   0

 //   8

 //   9     d_out[idx] = a;

    ld.param.u64    $rd1, [__cudaparm__Z6kernelPf_d_out];   //  id:25 __cudaparm__Z6kernelPf_d_out+0x0

    cvt.u64.s32     $rd2, $r3;          //

    mul.lo.u64  $rd3, $rd2, 4;      //

    add.u64     $rd4, $rd1, $rd3;       //

    st.global.f32   [$rd4+0], $f4;  //  id:26

    exit;                           //

    } // _Z6kernelPf

This is with CUDA 1.1 btw.

Sarnath · March 22, 2008, 8:55am

Thanks for your reply. I am invoking with -keep option only…But I invoke it from VS2005 environment. The “-v” option shows that the LIST:source on option is being passed to nvopencc… I am realyl not sure what is happening here… :-(

Let me retry it from command line and see if it makes a differnce.

Appreciate your time on this.

And yes, I am 1.1 too… I transitioned long long back…Nice to know another CUDA release is around the corner.

Best Regards,
Sarnath

MisterAnderson42 · March 22, 2008, 1:01pm

I’ve only ever tried the list source mode in linux, so that might be another factor.

Sarnath · March 23, 2008, 7:13am

Aaa. I c. can some1 try it on window and confirm the behaviour I saw.

Atleast, we can hope to fix this by next release…

Best Regards,

Sarnath

m11 · March 23, 2008, 11:03am

works in winXP as well.

nvcc.exe -link -ccbin “C:\Program Files\Microsoft Visual Studio 8\VC\bin” -DWIN32 -D_DEBUG -D_CONSOLE -Xcompiler “/EHsc /W3 /nologo /Wp64 /Od /Zi /MDd /GR” -I"C:\Program Files\NVIDIA Corporation\NVIDIA CUDA SDK\common\inc" -I"C:\CUDA\include" --opencc-options -LIST:source=on -keep kern.cu main.cpp

(not finished) bitonic sort for 1024 elements:

two files included generate over 1MB of listings, PTX is present as well.
kern.txt (1.43 KB)
main.cpp (638 Bytes)

Sarnath · March 23, 2008, 12:31pm

Well, You say PTX is present . But does it have C mixed with PTX i.e. C and PTX side by side as shown in Mr.Anderson’s post ?

m11 · March 23, 2008, 6:03pm

Is it the listing you are looking for ?
Did you try my example ?

…
…
// Loop body line 25, nesting depth: 1, estimated iterations: unknown
.loc 13 31 0
// 27 // Parallel bitonic sort.
// 28 for (int k = 2; k <= 512; k<<=1) // k *= 2)
// 29 {
// 30 // Bitonic merge:
// 31 for (int j = k / 2; j>0; j>>=1) // j /= 2)
div.s32 $r11, $r10, 2; //
mov.s32 $r12, $r11; //
mov.s32 $r13, 0; //
setp.le.s32 $p1, $r11, $r13; //
@$p1 bra $Lt_0_24; //
$Lt_0_26:
// Loop body line 31
xor.b32 $r14, $r12, $r3; //
setp.le.u32 $p2, $r14, $r3; //
@$p2 bra $Lt_0_27; //
// Part of loop body line 31, head labeled $Lt_0_26
mul.lo.u32 $r15, $r14, 4; //
ld.shared.f32 $f2, [$r6+0]; // id:77 __cuda_shared8+0x0
add.u32 $r16, $r15, $r1; //

…
…

Sarnath · March 24, 2008, 6:41am

Is it the listing you are looking for ?

Did you try my example ?

…

…

// Loop body line 25, nesting depth: 1, estimated iterations: unknown
.loc	13	31	0
//Â 27Â // Parallel bitonic sort.

//Â 28Â for (int k = 2; k <= 512; k<<=1) // k *= 2)

//Â 29Â {

//Â 30Â // Bitonic merge:

//Â 31Â for (int j = k / 2; j>0; j>>=1) // j /= 2)
div.s32Â  $r11, $r10, 2;Â  Â  Â  Â  //Â  

mov.s32Â  $r12, $r11;Â  Â  Â  Â  Â  	//Â  

mov.s32Â  $r13, 0;Â  Â  Â  Â  Â  Â  Â  //Â  

setp.le.s32Â  $p1, $r11, $r13;Â  //Â  

@$p1 braÂ  $Lt_0_24;Â  Â  Â  Â  Â  Â  //Â  
$Lt_0_26:

// Loop body line 31
xor.b32Â  $r14, $r12, $r3;Â  Â  Â  //Â  

setp.le.u32Â  $p2, $r14, $r3;Â  	//Â  

@$p2 braÂ  $Lt_0_27;Â  Â  Â  Â  Â  Â  //Â  
// Part of loop body line 31, head labeled $Lt_0_26
mul.lo.u32Â  $r15, $r14, 4;Â  Â  	//Â  

ld.shared.f32Â  $f2, [$r6+0];Â  	//Â  id:77 __cuda_shared8+0x0

add.u32Â  $r16, $r15, $r1;Â  Â  Â  //Â  
…

…

[snapback]349393[/snapback]

Sorry guys. When I removed “#pragma unroll”, C and PTX started coming side by side…

With the “#pragma” present – the C and PTX generation does NOT happen for that FOR loop.

I think its probably understandable…

My mistake guys… Sorry for troubling you all.

Thanks to “.m.” and Mr. Anderson for their time.

Best Regards,

Sarnath

RoofTopG · November 30, 2011, 11:18pm

nvcc warning : Option '--opencc-options (-Xopencc)' is obsolete and ignored

What’s the current way to display C along ptx (CUDA 4.1)? Thanks!

njuffa · December 1, 2011, 12:03am

Note that the “mixed listing” feature was tied to a component-specific flag. As CUDA 4.1 introduces a new frontend for sm_2x and up, that component has been replaced, and thus the flag is no longer accepted. In general, component-specific flags tend to be unsupported, and I would advise against their use in production builds.

There does not seem to be an equivalent “mixed listing” functionality provided via the new frontend. Sorry for the inconvenience. If you find the “mixed listing” capability useful (I was not aware of this feature and thus do not know what the listing looks like) and would like to see it re-instated, I would suggest filing a feature request through the bug reporting form. This is accessible via the registered developer website at partners.nvidia.com. There isa link in a menu on the left hand side of the screen.

RoofTopG · December 1, 2011, 12:19pm

OK, thanks for the quick reply. Well, for me it was useful when comparing ptx output of two versions of the same, but slightly modified source code - it helps me to quickly find what to look at, but probably there are plenty of other ways to achieve that too. it’d be great if somebody more experienced with ptx can give some tips!

Btw, is there maybe some workaround how to display combined C+PTX?

RoofTopG · December 1, 2011, 12:22pm

p.s. Are there named barriers in CUDA4.1? I’d insert two barriers A and B around the interesting region of CUDA code and search for them in the matching ptx.

tera · December 1, 2011, 2:34pm

If you just want to use them as markers, you could make your own by inserting inline PTX-“assembler” comments;

asm volatile ("// this is line ...");

RoofTopG · December 1, 2011, 2:36pm

thanks tera, I’ll try that out!

Topic		Replies	Views
why CUDA 2.0 does not expose all PTX ISA 1.3 ? CUDA Programming and Performance	20	27714	November 5, 2008
CUDA 9.2 (9.2.148) Update1, nvcc compiler bug CUDA Programming and Performance	10	1125	August 18, 2018
How to tell nvcc that some `if` must diverge and stop trying to fuse previous statements into it? CUDA Programming and Performance	20	443	March 3, 2024
first install of cuda CUDA Setup and Installation	6	7621	February 12, 2017
CUDA Kernel self-suspension ? Can a CUDA Kernel conditionally suspend its execution ? CUDA Programming and Performance	46	45173	April 17, 2011
Some problems with inline PTX CUDA Programming and Performance	6	1797	March 6, 2013
Peculiar Shared Memory Behavior - NEED HELP! A test program appears to have two arrays sharing v CUDA Programming and Performance	21	2159	June 5, 2011
Compilation Errors with GCC Versions 11-14 and CUDA Toolkit 12.5/12.6 Due to Undefined `__builtin_ia32_ldtilecfg` and `__builtin_ia32_sttilecfg`, etc GPU-Accelerated Libraries cuda	2	342	October 15, 2024
Generate CUDA at run-time ? CUDA Programming and Performance	13	3064	September 28, 2011
Wishlist Place your considered suggestions here CUDA Programming and Performance	201	204313	April 13, 2009

PTX code along with C code -LIST:source=on is NOT working

Related topics