cuobjdump now available to registered developers

cuobjdump supports disassembly of .cubin files built for sm_10 through sm_13 and is now available to registered developers. If you drop the binary in your CUDA bin directory on Linux, you can then disassemble running applications using the disas command.

Hey, an official decuda clone. That’s great news! I guess that’ll make me register as a developer then.
Thanks!

Hey, an official decuda clone. That’s great news! I guess that’ll make me register as a developer then.
Thanks!

I believe the PDF file was not rendered correctly. It ends with a copy of the Supported Options section and then the message:

ERROR: undefined
OFFENDING COMMAND: f‘~
STACK:

Does the PDF contain all the information it was intended to contain?

I believe the PDF file was not rendered correctly. It ends with a copy of the Supported Options section and then the message:

ERROR: undefined
OFFENDING COMMAND: f‘~
STACK:

Does the PDF contain all the information it was intended to contain?

PDF being broken is known, and there should be an updated package shortly.

PDF being broken is known, and there should be an updated package shortly.

It is unclear to me how to use cuobjdump. I tried

cuobjdump.exe -sass test_kernel.compute_20.sm_20.cubin

The PDF indicates that this would output disassembled code, but I only get code, const and bincode blocks.

The PDF doesn’t describe what the difference between the positional argument and the --file option is. I tried calls such as the ones below without getting any output (not even error messages):

cuobjdump.exe -fnam test_kernel.compute_20.sm_20.cubin
cuobjdump.exe -fnam -f test_kernel.compute_20.sm_20.cubin
cuobjdump.exe -cubin -fun _Z13test_kernelPfyiif test_kernel.compute_20.sm_20.cubin

It is unclear to me how to use cuobjdump. I tried

cuobjdump.exe -sass test_kernel.compute_20.sm_20.cubin

The PDF indicates that this would output disassembled code, but I only get code, const and bincode blocks.

The PDF doesn’t describe what the difference between the positional argument and the --file option is. I tried calls such as the ones below without getting any output (not even error messages):

cuobjdump.exe -fnam test_kernel.compute_20.sm_20.cubin
cuobjdump.exe -fnam -f test_kernel.compute_20.sm_20.cubin
cuobjdump.exe -cubin -fun _Z13test_kernelPfyiif test_kernel.compute_20.sm_20.cubin

cuobjdump doesn’t support sm_20, only sm_10 through sm_13.

cuobjdump doesn’t support sm_20, only sm_10 through sm_13.

Nice! cuobjectdump seems to also take .exe files, not just .cubin files, and at least dump out ptx. But, while I can disassemble a .cubin file, -sass doesn’t seem to show the disassembly that same data embedded in the .exe. Not sure why not, even though it seems I have the assembled code in the .exe. See output below.

d -ptx -sass -elf hw.compute_10.sm_10.cubin

.section .strtab STRTAB

.section .shstrtab STRTAB

.section .symtab SYMTAB

index value size info other shndx name

0 0 0 0 0 0 (null)

1 0 0 3 0 1 .shstrtab

2 0 0 3 0 2 .strtab

3 0 0 3 0 3 .symtab

4 0 0 3 0 0

5 0 0 3 0 0

6 0 24 3 0 4 .text._Z3funPi

7 0 0 3 0 6 .nv.shared._Z3funPi

8 0 0 3 0 5 .nv.info._Z3funPi

9 0 0 12 10 4 _Z3funPi

.nv.shared._Z3funPi NOBITS

No data to be dumped.

.text._Z3funPi PROGBITS

bar = 0 reg = 2 lmem=0 smem=20

0x1000c801 0x0423c780 0x10018005 0x00000003

0xd00e0005 0xa0c00781

.nv.info._Z3funPi PROGBITS

    <0x1>

    Attribute:      EIATTR_SMEM_PARAM_OFFSETS

    Format: EIFMT_SVAL

    Value:  0x0

code for sm_10

    --------------

            Function : _Z3funPi

    /*0000*/        MOV R0, g [0x4];

    /*0008*/        MVI R1, 0x1;

    /*0010*/        GST.U32 global14 [R0], R1;

            ...................

d -ptx -sass -elf …/Debug/hw.exe

c:/Personal/tem/cuda-memory-debug/hw/hw.cu:

==========================================

Version = 0x00000004

gpuInfoVersion = 0xa14f518d

key = 0d5c85b21bfe3b0c

usageMode = -maxrregcount=32

debuggable = no

ptx code for compute_20

    -----------------------

                    .version 2.1

                    .target sm_20

                    // compiled with C:\CUDA\bin/../open64/lib//be.exe

                    // nvopencc 3.1 built on 2010-06-08

//-----------------------------------------------------------

                    // Compiling hw.compute_20.cpp3.i (C:/Users/Ken/AppData/Local/Temp/ccBI#.a07412)

                    //-----------------------------------------------------------

//-----------------------------------------------------------

                    // Options:

                    //-----------------------------------------------------------

                    //  Target:ptx, ISA:sm_20, Endian:little, Pointer Size:32

                    //  -O3 (Optimization level)

                    //  -g0 (Debug level)

                    //  -m2 (Report advisories)

                    //-----------------------------------------------------------

.file 1 “hw.compute_20.cudafe2.gpu”

                    .file   2       "C:\Program Files\Microsoft Visual Studio 9.0\VC\INCLUDE\crtdefs.h"

                    .file   3       "C:\CUDA\include\crt/device_runtime.h"

                    .file   4       "C:\CUDA\include\host_defines.h"

                    .file   5       "C:\CUDA\include\builtin_types.h"

                    .file   6       "c:\cuda\include\device_types.h"

                    .file   7       "c:\cuda\include\driver_types.h"

                    .file   8       "c:\cuda\include\surface_types.h"

                    .file   9       "c:\cuda\include\texture_types.h"

                    .file   10      "c:\cuda\include\vector_types.h"

                    .file   11      "c:\cuda\include\builtin_types.h"

                    .file   12      "c:\cuda\include\host_defines.h"

                    .file   13      "C:\CUDA\include\device_launch_parameters.h"

                    .file   14      "c:\cuda\include\crt\storage_class.h"

                    .file   15      "C:\Program Files\Microsoft Visual Studio 9.0\VC\INCLUDE\time.h"

                    .file   16      "c:\cuda\include\texture_fetch_functions.h"

                    .file   17      "C:\CUDA\include\common_functions.h"

                    .file   18      "c:\cuda\include\math_functions.h"

                    .file   19      "c:\cuda\include\math_constants.h"

                    .file   20      "c:\cuda\include\device_functions.h"

                    .file   21      "c:\cuda\include\sm_11_atomic_functions.h"

                    .file   22      "c:\cuda\include\sm_12_atomic_functions.h"

                    .file   23      "c:\cuda\include\sm_13_double_functions.h"

                    .file   24      "c:\cuda\include\sm_20_atomic_functions.h"

                    .file   25      "c:\cuda\include\sm_20_intrinsics.h"

                    .file   26      "c:\cuda\include\surface_functions.h"

                    .file   27      "c:\cuda\include\math_functions_dbl_ptx3.h"

                    .file   28      "c:/Personal/tem/cuda-memory-debug/hw/hw.cu"

.entry _Z3funPi (

                            .param .u32 __cudaparm__Z3funPi_mem)

                    {

                    .reg .u32 %r<4>;

                    .loc    28      4       0

            $LDWbegin__Z3funPi:

                    .loc    28      6       0

                    mov.s32         %r1, 1;

                    ld.param.u32    %r2, [__cudaparm__Z3funPi_mem];

                    st.global.s32   [%r2+0], %r1;

                    .loc    28      7       0

                    exit;

            $LDWend__Z3funPi:

                    } // _Z3funPi

ptx code for compute_10

    -----------------------

                    .version 1.4

                    .target sm_10, map_f64_to_f32

                    // compiled with C:\CUDA\bin/../open64/lib//be.exe

                    // nvopencc 3.1 built on 2010-06-08

//-----------------------------------------------------------

                    // Compiling hw.compute_10.cpp3.i (C:/Users/Ken/AppData/Local/Temp/ccBI#.a04652)

                    //-----------------------------------------------------------

//-----------------------------------------------------------

                    // Options:

                    //-----------------------------------------------------------

                    //  Target:ptx, ISA:sm_10, Endian:little, Pointer Size:32

                    //  -O3 (Optimization level)

                    //  -g0 (Debug level)

                    //  -m2 (Report advisories)

                    //-----------------------------------------------------------

.file 1 “hw.compute_10.cudafe2.gpu”

                    .file   2       "C:\Program Files\Microsoft Visual Studio 9.0\VC\INCLUDE\crtdefs.h"

                    .file   3       "C:\CUDA\include\crt/device_runtime.h"

                    .file   4       "C:\CUDA\include\host_defines.h"

                    .file   5       "C:\CUDA\include\builtin_types.h"

                    .file   6       "c:\cuda\include\device_types.h"

                    .file   7       "c:\cuda\include\driver_types.h"

                    .file   8       "c:\cuda\include\surface_types.h"

                    .file   9       "c:\cuda\include\texture_types.h"

                    .file   10      "c:\cuda\include\vector_types.h"

                    .file   11      "c:\cuda\include\builtin_types.h"

                    .file   12      "c:\cuda\include\host_defines.h"

                    .file   13      "C:\CUDA\include\device_launch_parameters.h"

                    .file   14      "c:\cuda\include\crt\storage_class.h"

                    .file   15      "C:\Program Files\Microsoft Visual Studio 9.0\VC\INCLUDE\time.h"

                    .file   16      "c:\cuda\include\texture_fetch_functions.h"

                    .file   17      "C:\CUDA\include\common_functions.h"

                    .file   18      "c:\cuda\include\math_functions.h"

                    .file   19      "c:\cuda\include\math_constants.h"

                    .file   20      "c:\cuda\include\device_functions.h"

                    .file   21      "c:\cuda\include\sm_11_atomic_functions.h"

                    .file   22      "c:\cuda\include\sm_12_atomic_functions.h"

                    .file   23      "c:\cuda\include\sm_13_double_functions.h"

                    .file   24      "c:\cuda\include\sm_20_atomic_functions.h"

                    .file   25      "c:\cuda\include\sm_20_intrinsics.h"

                    .file   26      "c:\cuda\include\surface_functions.h"

                    .file   27      "c:\cuda\include\math_functions_dbl_ptx1.h"

                    .file   28      "c:/Personal/tem/cuda-memory-debug/hw/hw.cu"

.entry _Z3funPi (

                            .param .u32 __cudaparm__Z3funPi_mem)

                    {

                    .reg .u32 %r<4>;

                    .loc    28      4       0

            $LDWbegin__Z3funPi:

                    .loc    28      6       0

                    mov.s32         %r1, 1;

                    ld.param.u32    %r2, [__cudaparm__Z3funPi_mem];

                    st.global.s32   [%r2+0], %r1;

                    .loc    28      7       0

                    exit;

            $LDWend__Z3funPi:

                    } // _Z3funPi

code for sm_20

    --------------

            architecture {sm_20}

            abiversion  {1}

            cubinversion  {1}

            modname  {cubin}

            texmode  {texmode_unified}

            code {

                      name = _Z3funPi

                      lmem = 0

                      smem = 0

                      reg = 3

                      bar = 0

                      ctaidZUsed = 0

const {

                                      segname = const

                                      segnum = 0

                                      offset = 0

                                      bytes = 36

                      }

bincode {

                              0x00005de4 0x28004404 0x80001de4 0x28004000

                              0x04009de2 0x18000000 0x00009c85 0x90000000

                              0x00001de7 0x80000000

                              }

                  }

.section .strtab STRTAB

.section .shstrtab STRTAB

.section .symtab SYMTAB

index value size info other shndx name

0 0 0 0 0 0 (null)

1 0 0 3 0 1 .shstrtab

2 0 0 3 0 2 .strtab

3 0 0 3 0 3 .symtab

4 0 0 3 0 0

5 0 0 3 0 0

6 0 40 3 0 4 .text._Z3funPi

7 0 0 3 0 7 .nv.info

8 0 0 3 0 5 .nv.constant0._Z3funPi

9 0 0 3 0 6 .nv.info._Z3funPi

10 0 40 12 10 4 _Z3funPi

.nv.constant0._Z3funPi PROGBITS

0x00000000 0x00000000 0x00000000 0x00000000 0x00000000

0x00000000 0x00000000 0x00000000 0x00000000

.text._Z3funPi PROGBITS

bar = 0 reg = 3 lmem=0 smem=0

0x00005de4 0x28004404 0x80001de4 0x28004000

0x04009de2 0x18000000 0x00009c85 0x90000000

0x00001de7 0x80000000

.nv.info._Z3funPi PROGBITS

    <0x1>

    Attribute:      EIATTR_CBANK_PARAM_OFFSETS

    Format: EIFMT_SVAL

    Value:  0x0

    <0x2>

    Attribute:      EIATTR_PARAM_CBANK

    Format: EIFMT_SVAL

    Value:  0x8 0x40020

.nv.info PROGBITS

    <0x1>

    Attribute:      EIATTR_FRAME_SIZE

    Format: EIFMT_SVAL

    Value:  function: _Z3funPi(0xa) frame size: 0x0

code for sm_10

    --------------

            architecture {sm_10}

            abiversion  {1}

            cubinversion  {1}

            modname  {cubin}

            texmode  {texmode_unified}

            code {

                      name = _Z3funPi

                      lmem = 0

                      smem = 20

                      reg = 2

                      bar = 0

                      ctaidZUsed = 0

bincode {

                              0x1000c801 0x0423c780 0x10018005 0x00000003

                              0xd00e0005 0xa0c00781

                              }

                  }

.section .strtab STRTAB

.section .shstrtab STRTAB

.section .symtab SYMTAB

index value size info other shndx name

0 0 0 0 0 0 (null)

1 0 0 3 0 1 .shstrtab

2 0 0 3 0 2 .strtab

3 0 0 3 0 3 .symtab

4 0 0 3 0 0

5 0 0 3 0 0

6 0 24 3 0 4 .text._Z3funPi

7 0 0 3 0 6 .nv.shared._Z3funPi

8 0 0 3 0 5 .nv.info._Z3funPi

9 0 0 12 10 4 _Z3funPi

.nv.shared._Z3funPi NOBITS

No data to be dumped.

.text._Z3funPi PROGBITS

bar = 0 reg = 2 lmem=0 smem=20

0x1000c801 0x0423c780 0x10018005 0x00000003

0xd00e0005 0xa0c00781

.nv.info._Z3funPi PROGBITS

    <0x1>

    Attribute:      EIATTR_SMEM_PARAM_OFFSETS

    Format: EIFMT_SVAL

    Value:  0x0

Nice! cuobjectdump seems to also take .exe files, not just .cubin files, and at least dump out ptx. But, while I can disassemble a .cubin file, -sass doesn’t seem to show the disassembly that same data embedded in the .exe. Not sure why not, even though it seems I have the assembled code in the .exe. See output below.

d -ptx -sass -elf hw.compute_10.sm_10.cubin

.section .strtab STRTAB

.section .shstrtab STRTAB

.section .symtab SYMTAB

index value size info other shndx name

0 0 0 0 0 0 (null)

1 0 0 3 0 1 .shstrtab

2 0 0 3 0 2 .strtab

3 0 0 3 0 3 .symtab

4 0 0 3 0 0

5 0 0 3 0 0

6 0 24 3 0 4 .text._Z3funPi

7 0 0 3 0 6 .nv.shared._Z3funPi

8 0 0 3 0 5 .nv.info._Z3funPi

9 0 0 12 10 4 _Z3funPi

.nv.shared._Z3funPi NOBITS

No data to be dumped.

.text._Z3funPi PROGBITS

bar = 0 reg = 2 lmem=0 smem=20

0x1000c801 0x0423c780 0x10018005 0x00000003

0xd00e0005 0xa0c00781

.nv.info._Z3funPi PROGBITS

    <0x1>

    Attribute:      EIATTR_SMEM_PARAM_OFFSETS

    Format: EIFMT_SVAL

    Value:  0x0

code for sm_10

    --------------

            Function : _Z3funPi

    /*0000*/        MOV R0, g [0x4];

    /*0008*/        MVI R1, 0x1;

    /*0010*/        GST.U32 global14 [R0], R1;

            ...................

d -ptx -sass -elf …/Debug/hw.exe

c:/Personal/tem/cuda-memory-debug/hw/hw.cu:

==========================================

Version = 0x00000004

gpuInfoVersion = 0xa14f518d

key = 0d5c85b21bfe3b0c

usageMode = -maxrregcount=32

debuggable = no

ptx code for compute_20

    -----------------------

                    .version 2.1

                    .target sm_20

                    // compiled with C:\CUDA\bin/../open64/lib//be.exe

                    // nvopencc 3.1 built on 2010-06-08

//-----------------------------------------------------------

                    // Compiling hw.compute_20.cpp3.i (C:/Users/Ken/AppData/Local/Temp/ccBI#.a07412)

                    //-----------------------------------------------------------

//-----------------------------------------------------------

                    // Options:

                    //-----------------------------------------------------------

                    //  Target:ptx, ISA:sm_20, Endian:little, Pointer Size:32

                    //  -O3 (Optimization level)

                    //  -g0 (Debug level)

                    //  -m2 (Report advisories)

                    //-----------------------------------------------------------

.file 1 “hw.compute_20.cudafe2.gpu”

                    .file   2       "C:\Program Files\Microsoft Visual Studio 9.0\VC\INCLUDE\crtdefs.h"

                    .file   3       "C:\CUDA\include\crt/device_runtime.h"

                    .file   4       "C:\CUDA\include\host_defines.h"

                    .file   5       "C:\CUDA\include\builtin_types.h"

                    .file   6       "c:\cuda\include\device_types.h"

                    .file   7       "c:\cuda\include\driver_types.h"

                    .file   8       "c:\cuda\include\surface_types.h"

                    .file   9       "c:\cuda\include\texture_types.h"

                    .file   10      "c:\cuda\include\vector_types.h"

                    .file   11      "c:\cuda\include\builtin_types.h"

                    .file   12      "c:\cuda\include\host_defines.h"

                    .file   13      "C:\CUDA\include\device_launch_parameters.h"

                    .file   14      "c:\cuda\include\crt\storage_class.h"

                    .file   15      "C:\Program Files\Microsoft Visual Studio 9.0\VC\INCLUDE\time.h"

                    .file   16      "c:\cuda\include\texture_fetch_functions.h"

                    .file   17      "C:\CUDA\include\common_functions.h"

                    .file   18      "c:\cuda\include\math_functions.h"

                    .file   19      "c:\cuda\include\math_constants.h"

                    .file   20      "c:\cuda\include\device_functions.h"

                    .file   21      "c:\cuda\include\sm_11_atomic_functions.h"

                    .file   22      "c:\cuda\include\sm_12_atomic_functions.h"

                    .file   23      "c:\cuda\include\sm_13_double_functions.h"

                    .file   24      "c:\cuda\include\sm_20_atomic_functions.h"

                    .file   25      "c:\cuda\include\sm_20_intrinsics.h"

                    .file   26      "c:\cuda\include\surface_functions.h"

                    .file   27      "c:\cuda\include\math_functions_dbl_ptx3.h"

                    .file   28      "c:/Personal/tem/cuda-memory-debug/hw/hw.cu"

.entry _Z3funPi (

                            .param .u32 __cudaparm__Z3funPi_mem)

                    {

                    .reg .u32 %r<4>;

                    .loc    28      4       0

            $LDWbegin__Z3funPi:

                    .loc    28      6       0

                    mov.s32         %r1, 1;

                    ld.param.u32    %r2, [__cudaparm__Z3funPi_mem];

                    st.global.s32   [%r2+0], %r1;

                    .loc    28      7       0

                    exit;

            $LDWend__Z3funPi:

                    } // _Z3funPi

ptx code for compute_10

    -----------------------

                    .version 1.4

                    .target sm_10, map_f64_to_f32

                    // compiled with C:\CUDA\bin/../open64/lib//be.exe

                    // nvopencc 3.1 built on 2010-06-08

//-----------------------------------------------------------

                    // Compiling hw.compute_10.cpp3.i (C:/Users/Ken/AppData/Local/Temp/ccBI#.a04652)

                    //-----------------------------------------------------------

//-----------------------------------------------------------

                    // Options:

                    //-----------------------------------------------------------

                    //  Target:ptx, ISA:sm_10, Endian:little, Pointer Size:32

                    //  -O3 (Optimization level)

                    //  -g0 (Debug level)

                    //  -m2 (Report advisories)

                    //-----------------------------------------------------------

.file 1 “hw.compute_10.cudafe2.gpu”

                    .file   2       "C:\Program Files\Microsoft Visual Studio 9.0\VC\INCLUDE\crtdefs.h"

                    .file   3       "C:\CUDA\include\crt/device_runtime.h"

                    .file   4       "C:\CUDA\include\host_defines.h"

                    .file   5       "C:\CUDA\include\builtin_types.h"

                    .file   6       "c:\cuda\include\device_types.h"

                    .file   7       "c:\cuda\include\driver_types.h"

                    .file   8       "c:\cuda\include\surface_types.h"

                    .file   9       "c:\cuda\include\texture_types.h"

                    .file   10      "c:\cuda\include\vector_types.h"

                    .file   11      "c:\cuda\include\builtin_types.h"

                    .file   12      "c:\cuda\include\host_defines.h"

                    .file   13      "C:\CUDA\include\device_launch_parameters.h"

                    .file   14      "c:\cuda\include\crt\storage_class.h"

                    .file   15      "C:\Program Files\Microsoft Visual Studio 9.0\VC\INCLUDE\time.h"

                    .file   16      "c:\cuda\include\texture_fetch_functions.h"

                    .file   17      "C:\CUDA\include\common_functions.h"

                    .file   18      "c:\cuda\include\math_functions.h"

                    .file   19      "c:\cuda\include\math_constants.h"

                    .file   20      "c:\cuda\include\device_functions.h"

                    .file   21      "c:\cuda\include\sm_11_atomic_functions.h"

                    .file   22      "c:\cuda\include\sm_12_atomic_functions.h"

                    .file   23      "c:\cuda\include\sm_13_double_functions.h"

                    .file   24      "c:\cuda\include\sm_20_atomic_functions.h"

                    .file   25      "c:\cuda\include\sm_20_intrinsics.h"

                    .file   26      "c:\cuda\include\surface_functions.h"

                    .file   27      "c:\cuda\include\math_functions_dbl_ptx1.h"

                    .file   28      "c:/Personal/tem/cuda-memory-debug/hw/hw.cu"

.entry _Z3funPi (

                            .param .u32 __cudaparm__Z3funPi_mem)

                    {

                    .reg .u32 %r<4>;

                    .loc    28      4       0

            $LDWbegin__Z3funPi:

                    .loc    28      6       0

                    mov.s32         %r1, 1;

                    ld.param.u32    %r2, [__cudaparm__Z3funPi_mem];

                    st.global.s32   [%r2+0], %r1;

                    .loc    28      7       0

                    exit;

            $LDWend__Z3funPi:

                    } // _Z3funPi

code for sm_20

    --------------

            architecture {sm_20}

            abiversion  {1}

            cubinversion  {1}

            modname  {cubin}

            texmode  {texmode_unified}

            code {

                      name = _Z3funPi

                      lmem = 0

                      smem = 0

                      reg = 3

                      bar = 0

                      ctaidZUsed = 0

const {

                                      segname = const

                                      segnum = 0

                                      offset = 0

                                      bytes = 36

                      }

bincode {

                              0x00005de4 0x28004404 0x80001de4 0x28004000

                              0x04009de2 0x18000000 0x00009c85 0x90000000

                              0x00001de7 0x80000000

                              }

                  }

.section .strtab STRTAB

.section .shstrtab STRTAB

.section .symtab SYMTAB

index value size info other shndx name

0 0 0 0 0 0 (null)

1 0 0 3 0 1 .shstrtab

2 0 0 3 0 2 .strtab

3 0 0 3 0 3 .symtab

4 0 0 3 0 0

5 0 0 3 0 0

6 0 40 3 0 4 .text._Z3funPi

7 0 0 3 0 7 .nv.info

8 0 0 3 0 5 .nv.constant0._Z3funPi

9 0 0 3 0 6 .nv.info._Z3funPi

10 0 40 12 10 4 _Z3funPi

.nv.constant0._Z3funPi PROGBITS

0x00000000 0x00000000 0x00000000 0x00000000 0x00000000

0x00000000 0x00000000 0x00000000 0x00000000

.text._Z3funPi PROGBITS

bar = 0 reg = 3 lmem=0 smem=0

0x00005de4 0x28004404 0x80001de4 0x28004000

0x04009de2 0x18000000 0x00009c85 0x90000000

0x00001de7 0x80000000

.nv.info._Z3funPi PROGBITS

    <0x1>

    Attribute:      EIATTR_CBANK_PARAM_OFFSETS

    Format: EIFMT_SVAL

    Value:  0x0

    <0x2>

    Attribute:      EIATTR_PARAM_CBANK

    Format: EIFMT_SVAL

    Value:  0x8 0x40020

.nv.info PROGBITS

    <0x1>

    Attribute:      EIATTR_FRAME_SIZE

    Format: EIFMT_SVAL

    Value:  function: _Z3funPi(0xa) frame size: 0x0

code for sm_10

    --------------

            architecture {sm_10}

            abiversion  {1}

            cubinversion  {1}

            modname  {cubin}

            texmode  {texmode_unified}

            code {

                      name = _Z3funPi

                      lmem = 0

                      smem = 20

                      reg = 2

                      bar = 0

                      ctaidZUsed = 0

bincode {

                              0x1000c801 0x0423c780 0x10018005 0x00000003

                              0xd00e0005 0xa0c00781

                              }

                  }

.section .strtab STRTAB

.section .shstrtab STRTAB

.section .symtab SYMTAB

index value size info other shndx name

0 0 0 0 0 0 (null)

1 0 0 3 0 1 .shstrtab

2 0 0 3 0 2 .strtab

3 0 0 3 0 3 .symtab

4 0 0 3 0 0

5 0 0 3 0 0

6 0 24 3 0 4 .text._Z3funPi

7 0 0 3 0 6 .nv.shared._Z3funPi

8 0 0 3 0 5 .nv.info._Z3funPi

9 0 0 12 10 4 _Z3funPi

.nv.shared._Z3funPi NOBITS

No data to be dumped.

.text._Z3funPi PROGBITS

bar = 0 reg = 2 lmem=0 smem=20

0x1000c801 0x0423c780 0x10018005 0x00000003

0xd00e0005 0xa0c00781

.nv.info._Z3funPi PROGBITS

    <0x1>

    Attribute:      EIATTR_SMEM_PARAM_OFFSETS

    Format: EIFMT_SVAL

    Value:  0x0

Packages have been updated with a fixed PDF. Sorry for the inconvenience.

Packages have been updated with a fixed PDF. Sorry for the inconvenience.

is there any problem with this command line? “cuobjdump.exe -ptx convolutionTexture.sm_13.cubin” . it doesn’t o/p anything

is there any problem with this command line? “cuobjdump.exe -ptx convolutionTexture.sm_13.cubin” . it doesn’t o/p anything

Ok… the ptx is in the exe file, but same as when created with --keep option. so to see the final optimization we need to have a look at the assembly itself right?

Ok… the ptx is in the exe file, but same as when created with --keep option. so to see the final optimization we need to have a look at the assembly itself right?

Is cuobjdump part of some dev release (3.2 RC?) – or where can I download it?