preventing nvcc name mangling for use with python ctypes

khosra · May 16, 2009, 12:02pm

I am using pystream to use CUDA within python. Despite the library not being developed, it’s pretty nice written and usable.

I am having trouble compiling kernels with nvcc and then loading them with ctypes as the names are mangled. In pystreams test directory, they have an example using the SDK matrixMul example kernel. I can get it to work if I hardcode the mangled name in my python code. I was wondering if there is a way around this.

To build (Mac OS X):

nvcc -I/Developer/CUDA/projects/matrixMul -c -o matrixMul_kernel.cu_o matrixMul_kernel.cu

g++ -L/usr/local/cuda/lib -lcudart -lcuda -dynamiclib matrixMul_kernel.cu_o -o libmatrixMul.dylib

The mangling:

$ nm -g libmatrixMul.dylib | grep ___device_stub

00000ec6 T ___device_stub__Z9matrixMulPfS_S_ii

And how I have to call it in python:

print 'Calling kernel

Mul = matrixMul._Z9matrixMulPfS_S_ii(grid, threads)

# rather than Mul = matrixMul.matrixMul(grid, threads)

Mul(dC.data, dA.data, dB.data, WA, WB)

This is no big deal, but I assume given I don’t know what I am doing, that there are multiple things that I am doing wrong that have accidentally made this work.

I tried extern “C” for matrixMul in matrixMul_kernel.cu and that didn’t change anything. Also tried nvcc --host-compilation=C.

Thanks!

Tomas_Klacko · June 1, 2009, 2:04pm

I had a similar issue. You might want to try to create matrixMul_kernel.cu.cpp with
nvcc --cuda matrixMul_kernel.cu' command. Then, in the generated cpp file (which contains C++ source code) find the line with the start of the definition of your matrixMul function-kernel (I believe it should be something like extern … matrixMul’) and replace it
with `extern “C” … matrixMul’. This has prevented name mangling in my case
(I then just used g++ to compile the cpp file into the o file).

khosra · June 1, 2009, 4:11pm

In my case, nvcc --cuda generates extern statements like the following, with the name mangling already in there:

extern void __device_stub__Z27ComplexPointwiseMulAndScaleP6float2PKS_if(Complex *, const Complex *, int, float);

Not sure if inserting extern “C” here would help.

In python, I have this bad hack of recovering the original names (cut and pasting relevant code). There has to be better way:

def __init__(self, dll):

		self.dll = dll

		self._getnm()

	def _getnm(self):

		"""

		Hack to make up for name mangling by nvcc.

		"""

		import subprocess

		nm = subprocess.Popen(["/usr/bin/nm", "-gj", self.dll._name],

							  stdout=subprocess.PIPE).communicate()[0].split('\n')

		self.nm = [str[16:] for str in nm if str.startswith('___device_stub__')]

		

	def __getattr__(self, name):

		# find name in symbol table, error if more than one match

		found = 0

		match = None

		for i in self.nm:

			if i.find(name) >= 0:

				found += 1

				match = i

		if found > 1:

			raise AttributeError('Found %i matches for kernel %s in symbol table' % (found, name))

		if match is None:

			raise AttributeError('Kernel %s not found in symbol table' % name)

		mangled_name = '__device_stub__%s' % match

		

		try:

			funcptr = getattr(self.dll, mangled_name)

		except AttributeError:

			raise AttributeError("could not find kernel named %r in %r" % (name, self.dll))

		# Return a factory function that will create the Kernel object.

		factory = lambda *args, **kwds: Kernel(funcptr, *args, **kwds)

		return factory