I am using pystream to use CUDA within python. Despite the library not being developed, it’s pretty nice written and usable.
I am having trouble compiling kernels with nvcc and then loading them with ctypes as the names are mangled. In pystreams test directory, they have an example using the SDK matrixMul example kernel. I can get it to work if I hardcode the mangled name in my python code. I was wondering if there is a way around this.
$ nm -g libmatrixMul.dylib | grep ___device_stub
00000ec6 T ___device_stub__Z9matrixMulPfS_S_ii
And how I have to call it in python:
print 'Calling kernel
Mul = matrixMul._Z9matrixMulPfS_S_ii(grid, threads)
# rather than Mul = matrixMul.matrixMul(grid, threads)
Mul(dC.data, dA.data, dB.data, WA, WB)
This is no big deal, but I assume given I don’t know what I am doing, that there are multiple things that I am doing wrong that have accidentally made this work.
I tried extern “C” for matrixMul in matrixMul_kernel.cu and that didn’t change anything. Also tried nvcc --host-compilation=C.
I had a similar issue. You might want to try to create matrixMul_kernel.cu.cpp with nvcc --cuda matrixMul_kernel.cu' command. Then, in the generated cpp file (which contains C++ source code) find the line with the start of the definition of your matrixMul function-kernel (I believe it should be something like extern … matrixMul’) and replace it
with `extern “C” … matrixMul’. This has prevented name mangling in my case
(I then just used g++ to compile the cpp file into the o file).
In python, I have this bad hack of recovering the original names (cut and pasting relevant code). There has to be better way:
def __init__(self, dll):
self.dll = dll
self._getnm()
def _getnm(self):
"""
Hack to make up for name mangling by nvcc.
"""
import subprocess
nm = subprocess.Popen(["/usr/bin/nm", "-gj", self.dll._name],
stdout=subprocess.PIPE).communicate()[0].split('\n')
self.nm = [str[16:] for str in nm if str.startswith('___device_stub__')]
def __getattr__(self, name):
# find name in symbol table, error if more than one match
found = 0
match = None
for i in self.nm:
if i.find(name) >= 0:
found += 1
match = i
if found > 1:
raise AttributeError('Found %i matches for kernel %s in symbol table' % (found, name))
if match is None:
raise AttributeError('Kernel %s not found in symbol table' % name)
mangled_name = '__device_stub__%s' % match
try:
funcptr = getattr(self.dll, mangled_name)
except AttributeError:
raise AttributeError("could not find kernel named %r in %r" % (name, self.dll))
# Return a factory function that will create the Kernel object.
factory = lambda *args, **kwds: Kernel(funcptr, *args, **kwds)
return factory