Script to assist ELF disassembly (decuda for CUDA 3.x)

(This is a split from the nvcc lmem/alias analysis bug thread. We were getting offtopic.)

decuda is currently not useful for CUDA 3.x non beta releases because it appears that the release compilers do not allow you to generate the old CUBIN format rather than ELF, and decuda (in wumpus’ words) is currently missing ELF support because of the lack of good ELF libraries for Python. I don’t have time to write a proper ELF parser at the moment, but here’s a stopgap - it uses objdump to generate an ASCII dump of the ELF file, and parses that (paying attention to little details like endianness ;)) into an approximation of an old-style CUBIN. It’s certainly not perfect, and I offer no warranty whatsoever, but it seems to work for me. You’ll need objdump to be in your path for it to work. Call it as “python elfToCubin.py [ELF-style CUBIN file]” and it will dump an old-style CUBIN to stdout.

In addition to copying from the code box, you can download it here

Hopefully this will be useful for people both as a short-term stopgap until proper ELF support is built into decuda, and for decuda developers once a proper ELF parser can be put in place of calling objdump externally!

EDIT: The code below is out of date; grab the latest version from the link above. Leaving the old version below for posterity.

[codebox]

#!/usr/bin/python

Copyright 2010, Imran Haque

ihaque AT cs DOT stanford DOT edu

http://cs.stanford.edu/people/ihaque

This code is placed in the public domain and bears no warranty whatsoever

Uses objdump to dump contents of an nvcc-generated ELF file

and parses this output to generate an old-format CUBIN file

Notable shortcomings:

kernel .info sections are not parsed (presumably contains memory/reg info)

no attempt is made to preserve names of constant sections

does not parse size or proper name/offset of constant sections

emits initializer data for file-constant variables

import sys

import socket

from subprocess import Popen,PIPE

sys.stderr.write(“This script is out of date! Get the latest version at http://cs.stanford.edu/people/ihaque/elfTo…y\n”)

class cubin:

def __init__(self):

    self.kernels = []

    self.consts = []

    self.arch = "sm_10"

    self.abiversion = "1"

    self.modname = "cubin"

def output(self,f):

    f.write("architecture {%s}\n"%self.arch)

    f.write("abiversion   {%s}\n"%self.abiversion)

    f.write("modname      {%s}\n"%self.modname)

    for c in self.consts:

        c.output(f)

    for k in self.kernels:

        k.output(f)

class constant:

def __init__(self,name,data=None,depth=0,inkernel=False):

    self.name = name

    self.segname = "const"

    self.segnum = 0

    self.offset = 0

    self.bytes = 0

    self.depth = depth

    self.mem = None

    self.inkernel = inkernel

    if data is not None:

        self.mem = mem(data,self.depth+1)

def output(self,f):

    prefix= "\t"*(self.depth+1)

    shortprefix= "\t"*(self.depth)

    if self.inkernel:

        f.write("%sconst {\n"%shortprefix)

    else:

        f.write("%sconsts {\n"%shortprefix)

        if self.name is not None:

            f.write("%s\t\tname    = %s\n"%(prefix,self.name))

    f.write("%s\t\tsegname = %s\n"%(prefix,self.segname))

    f.write("%s\t\tsegnum  = %d\n"%(prefix,self.segnum))

    f.write("%s\t\toffset  = %d\n"%(prefix,self.offset))

    f.write("%s\t\tbytes   = %d\n"%(prefix,self.bytes))

    if self.mem is not None:

        f.write("%smem {\n"%prefix)

        self.mem.output(f)

        f.write("%s}\n"%prefix)

    f.write("%s}\n"%shortprefix)

class kernel:

def __init__(self,name,buf_text=None,buf_const=None,buf_info=Non

e):

    self.name = name

    self.lmem = 0

    self.smem = 0

    self.reg = 0

    self.bar = 0

    self.offset = 0

    self.bytes = 0

    self.consts = []

    self.code = None

    if buf_text is not None:

        self.code = mem(buf_text)

    if buf_const is not None:

        self.consts = [constant("%s0"%self.name,buf_const,1,True)]

    if buf_info is not None:

        self.info = mem(buf_info,1)

def output(self,f):

    f.write("code {\n")

    f.write("\tname = %s\n"%self.name)

    f.write("\tlmem = %d\n"%self.lmem)

    f.write("\tsmem = %d\n"%self.smem)

    f.write("\treg  = %d\n"%self.reg )

    f.write("\tbar  = %d\n"%self.bar )

    for c in self.consts:

        c.output(f)

    f.write("\tbincode {\n")

    self.code.output(f)

    f.write("\t}\n")

    f.write("}\n")

class mem:

def __init__(self,data,depth=0,little_endian=True):

    self.le = little_endian

    self.data = []

    if data is not None:

        self.append(data)

    self.tabs = "\t"*(depth+2)

    return

def append(self,data):

    for i in data:

        #print i

        x = int(i,16)

        if self.le:

            # This sort of assumes that the file was built on the same arch where it's being disasm'd

            x = socket.htonl(x)

        self.data.append(x)

    return

def output(self,f):

    for i in range(0,len(self.data),4):

        datastrs = ["0x%08x"%x for x in self.data[i:i+4]]

        f.write("%s%s\n"%(self.tabs," ".join(datastrs)))

    return

def parse_objdump(objdump):

# Parser states

START        = 0

INKERNEL     = 1

INFILECONST  = 2

INKCONST     = 3

INKINFO      = 4

lines = [x.strip() for x in objdump.split(“\n”)]

format = lines[1].split("-")[-1]

little_endian = True

if format != "little":

    little_endian = False

    print "Warning, found a non-little-endian file"

buffer_text = []

buffer_const = []

buffer_info = []

state = START

kname = None

const_id = 0

file = cubin()

for line in lines[2:]:

    #print "%%:",line

    if len(line.strip()) == 0:

        continue

    if   state == START: #{{{

        if line.startswith("Contents of section .text"):

            kname = line.split(".")[-1].rstrip(":")

            state = INKERNEL

            continue

        elif len(line) == 0:

            continue

        else:

            raise ValueError("Got unexpected line in state START\n%s"%line)

    #}}}

    elif state == INKERNEL: #{{{

        if line.startswith("Contents of section .text"):

            # Handle new kernel

            # Store old kernel

            file.kernels.append(kernel(kname,buffer_text,buffer_const,bu

ffer_info))

            buffer_text = []

            buffer_const = []

            buffer_info = []

            kname = line.split(".")[-1].rstrip(":")

            state = INKERNEL

            continue

        elif line.startswith("Contents of section .nv.constant"):

            # Handle const data

            if line.endswith("%s:"%kname):

                state = INKCONST

                continue

            else:

                # This is a file constant section

                # Store old kernel

                state = INFILECONST

                file.kernels.append(kernel(kname,buffer_text,buffer_const,bu

ffer_info))

                kname = None

                buffer_text = []

                buffer_const = []

                buffer_info = []

                continue

        elif line.startswith("Contents of section .nv.info"):

            # Handle kernel info

            state = INKINFO

            continue

        elif line[0].isdigit():

            # Handle new line of kernel binary

            # Take hex dump from objdump, remove address and ASCIIization

            buffer_text.extend(line[6:42].split())

        else:

            raise ValueError("Got unexpected line in state INKERNEL\n%s"%line)

    #}}}

    elif state == INFILECONST: #{{{

        if line.startswith("Contents of section .text"):

            # Handle new kernel

            # Store old kernel

            file.consts.append(constant("constant%d"%const_id,buffer_const))

            buffer_const = []

            const_id = 0

            kname = line.split(".")[-1].rstrip(":")

            state = INKERNEL

            continue

        elif line.startswith("Contents of section .nv.constant"):

            # Handle const data

            # we should not be in a kernel here

            # TODO make sure not in kernel

            file.consts.append(constant("constant%d"%const_id,buffer_const))

            buffer_const = []

            const_id = 0

            state = INFILECONST

            continue

        elif line.startswith("Contents of section .nv.info"):

            # Handle kernel info

            raise ValueError("Error, got into an .nv.info section from a file const section")

        elif line[0].isdigit():

            # Handle new line of kernel binary

            # Take hex dump from objdump, remove address and ASCIIization

            buffer_const.extend(line[6:42].split())

        else:

            raise ValueError("Got unexpected line in state INFILECONST\n%s"%line)

    #}}}

    elif state == INKCONST: #{{{

        if line.startswith("Contents of section .text"):

            # Handle new kernel

            # Store old kernel

            file.kernels.append(kernel(kname,buffer_text,buffer_const,bu

ffer_info))

            buffer_text = []

            buffer_const = []

            buffer_info = []

            const_id = 0

            kname = line.split(".")[-1].rstrip(":")

            state = INKERNEL

            continue

        elif line.startswith("Contents of section .nv.constant"):

            # Handle const data

            if line.endswith("%s:"%kname):

                raise ValueError("Can't deal yet with kernels with multiple const sections")

            else:

                # This is a file constant section

                # Store old kernel

                state = INFILECONST

                file.append(kernel(kname,buffer_text,buffer_const,buffer_inf

o))

                kname = None

                buffer_text = []

                buffer_const = []

                buffer_info = []

                const_id = 0

                continue

        elif line.startswith("Contents of section .nv.info"):

            # Handle kernel info

            state = INKINFO

            continue

        elif line[0].isdigit():

            # Handle new line of kernel binary

            # Take hex dump from objdump, remove address and ASCIIization

            buffer_const.extend(line[6:42].split())

        else:

            raise ValueError("Got unexpected line in state INKCONST\n%s"%line)

    #}}}

    elif state == INKINFO: #{{{

        if line.startswith("Contents of section .text"):

            # Handle new kernel

            # Store old kernel

            file.kernels.append(kernel(kname,buffer_text,buffer_const,bu

ffer_info))

            buffer_text = []

            buffer_const = []

            buffer_info = []

            const_id = 0

            kname = line.split(".")[-1].rstrip(":")

            state = INKERNEL

            continue

        elif line.startswith("Contents of section .nv.constant"):

            # Handle const data

            if line.endswith("%s:"%kname):

                state = INKCONST

                continue

            else:

                # This is a file constant section

                # Store old kernel

                state = INFILECONST

                file.kernels.append(kernel(kname,buffer_text,buffer_const,bu

ffer_info))

                kname = None

                buffer_text = []

                buffer_const = []

                buffer_info = []

                continue

        elif line.startswith("Contents of section .nv.info"):

            # Handle kernel info

            raise ValueError("Can't handle kernels with multiple INFO sections")

            state = INKINFO

            continue

        elif line[0].isdigit():

            # Handle new line of kernel binary

            # Take hex dump from objdump, remove address and ASCIIization

            buffer_info.extend(line[6:42].split())

        else:

            raise ValueError("Got unexpected line in state INKINFO\n%s"%line)

    #}}}

    else:

        raise ValueError("Hit an invalid state in parser")

if kname is None:

    if len(buffer_const) > 0:

        file.consts.append(constant("constant%d"%const_id,buffer_const))

else:

    file.kernels.append(kernel(kname,buffer_text,buffer_const,bu

ffer_info))

return file

if len(sys.argv) < 2:

print "Usage: elfToCubin [input file]"

sys.exit(1)

output = “”.join(Popen([“objdump”, “-s”, sys.argv[1]], stdout=PIPE).communicate()[0])

cubin = parse_objdump(output)

cubin.output(sys.stdout)

[/codebox]

Nice, this is a good intermediate solution. I guess it’d even work for windows given a objdump.exe for Windows.

Can I roll this into decuda?

Absolutely.

Great, thanks Imran. So now we can “decude” 3.1 cubins too…

By the way, welcome back Wumpus. :)

Great, thanks Imran. So now we can “decude” 3.1 cubins too…

By the way, welcome back Wumpus. :)

Yup! Note that although you can elfToCubin a sm_20 compiled file, it won’t decuda properly because decuda doesn’t yet have support for sm_20 style instructions.

Also, I’d suggest that you grab the script from the link in my first post rather than from the text box. I’m occasionally updating the script (made one change yesterday to add support for file-level .nv.info sections for sm_20), and I’ll update the web version but probably not the forum version.

Yup! Note that although you can elfToCubin a sm_20 compiled file, it won’t decuda properly because decuda doesn’t yet have support for sm_20 style instructions.

Also, I’d suggest that you grab the script from the link in my first post rather than from the text box. I’m occasionally updating the script (made one change yesterday to add support for file-level .nv.info sections for sm_20), and I’ll update the web version but probably not the forum version.

If you ever find a way to make your script output in a format that can be parsed by the disassembler from the Nouveau folks (nv50dist/nvc0dist), we would be able to disassemble Fermi instructions, and that would be great. :)

(Basically, you would just need to remove the cubin headers…)

If you ever find a way to make your script output in a format that can be parsed by the disassembler from the Nouveau folks (nv50dist/nvc0dist), we would be able to disassemble Fermi instructions, and that would be great. :)

(Basically, you would just need to remove the cubin headers…)

Done and done ;). The latest version adds support for a command-line option “–nouveau”. To use this, you should have both nv50dis and nvc0dis in your path. Given an ELF cubin file and the --nouveau option, the script will extract each kernel, and depending on whether the file’s flags indicate -code of sm_20 or less, automatically pass each kernel over to nv50dis or nvc0dis as appropriate:

ihaque@dev:/tmp$ python /home/ihaque/packages/decuda/elfToCubin.py --nouveau testclock.cubin_20

--> Disassembling kernel  _Z9testclockPl with nvc0dis

00000000: 2800440400005d04 l3 mov b32 $r1 c1[0x100]

00000008: 2c00000140001c04 mov b32 $r0 clock

00000010: 6000c00004001e03 shl b32 $r0 $r0 0x1 [unknown: 0000000000000200]

00000018: 2c00000140009c04 mov b32 $r2 clock

00000020: 6000c00004209e03 shl b32 $r2 $r2 0x1 [unknown: 0000000000000200]

00000028: 4800000000201d03 sub b32 $r0 $r2 $r0

00000030: 2800400080009d04 l3 mov b32 $r2 c0[0x20]

00000038: 9000000000201c85 st b32 wb g[$r2+0] $r0

00000040: 8000000000001d07 ??? 0x48 [unknown: 8000000000000100]

nvc0dis is not perfect yet for Fermi (as all the “unknowns” show), and the output of both disassemblers is slightly different from the decuda output we know and love, but it’s better than nothing. Cheers!

Done and done ;). The latest version adds support for a command-line option “–nouveau”. To use this, you should have both nv50dis and nvc0dis in your path. Given an ELF cubin file and the --nouveau option, the script will extract each kernel, and depending on whether the file’s flags indicate -code of sm_20 or less, automatically pass each kernel over to nv50dis or nvc0dis as appropriate:

ihaque@dev:/tmp$ python /home/ihaque/packages/decuda/elfToCubin.py --nouveau testclock.cubin_20

--> Disassembling kernel  _Z9testclockPl with nvc0dis

00000000: 2800440400005d04 l3 mov b32 $r1 c1[0x100]

00000008: 2c00000140001c04 mov b32 $r0 clock

00000010: 6000c00004001e03 shl b32 $r0 $r0 0x1 [unknown: 0000000000000200]

00000018: 2c00000140009c04 mov b32 $r2 clock

00000020: 6000c00004209e03 shl b32 $r2 $r2 0x1 [unknown: 0000000000000200]

00000028: 4800000000201d03 sub b32 $r0 $r2 $r0

00000030: 2800400080009d04 l3 mov b32 $r2 c0[0x20]

00000038: 9000000000201c85 st b32 wb g[$r2+0] $r0

00000040: 8000000000001d07 ??? 0x48 [unknown: 8000000000000100]

nvc0dis is not perfect yet for Fermi (as all the “unknowns” show), and the output of both disassemblers is slightly different from the decuda output we know and love, but it’s better than nothing. Cheers!

Excellent!

Thanks a lot.

Excellent!

Thanks a lot.

Great work! Thanks!

Vasily

Great work! Thanks!

Vasily

Is this project alive? The site at [url=“http://0x04.net/cgit/index.cgi/nv50dis”]http://0x04.net/cgit/index.cgi/nv50dis[/url] seems dead :(

Is this project alive? The site at [url=“http://0x04.net/cgit/index.cgi/nv50dis”]http://0x04.net/cgit/index.cgi/nv50dis[/url] seems dead :(

Yes, the project has moved to [url=“http://github.com/pathscale/envytools”]http://github.com/pathscale/envytools[/url] and the executable is now called envydis.

Yes, the project has moved to [url=“http://github.com/pathscale/envytools”]http://github.com/pathscale/envytools[/url] and the executable is now called envydis.

I am using decuda to extract assembly code for the sdk examples in Cuda 2.1 version. I am using GeForce 9800 GT

I am getting following error

Traceback (most recent call last):
File “decuda.py”, line 92, in
main()
File “decuda.py”, line 55, in main
cu = load(args[0])
File “/laanwj-decuda-c30bd17/CubinFile.py”, line 258, in load
inst = [int(x,0) for x in inst]
ValueError: invalid literal for int() with base 0: ‘/*’

Can somebody help?