I am using pystream to use CUDA within python. Despite the library not being developed, it’s pretty nice written and usable.
I am having trouble compiling kernels with nvcc and then loading them with ctypes as the names are mangled. In pystreams test directory, they have an example using the SDK matrixMul example kernel. I can get it to work if I hardcode the mangled name in my python code. I was wondering if there is a way around this.
To build (Mac OS X):
nvcc -I/Developer/CUDA/projects/matrixMul -c -o matrixMul_kernel.cu_o matrixMul_kernel.cu g++ -L/usr/local/cuda/lib -lcudart -lcuda -dynamiclib matrixMul_kernel.cu_o -o libmatrixMul.dylib
$ nm -g libmatrixMul.dylib | grep ___device_stub 00000ec6 T ___device_stub__Z9matrixMulPfS_S_ii
And how I have to call it in python:
print 'Calling kernel Mul = matrixMul._Z9matrixMulPfS_S_ii(grid, threads) # rather than Mul = matrixMul.matrixMul(grid, threads) Mul(dC.data, dA.data, dB.data, WA, WB)
This is no big deal, but I assume given I don’t know what I am doing, that there are multiple things that I am doing wrong that have accidentally made this work.
I tried extern “C” for matrixMul in matrixMul_kernel.cu and that didn’t change anything. Also tried nvcc --host-compilation=C.