Compiling cuda for python

Hi, I’m trying to create a module with Cuda for Python to do some FFT work; however, I’m not really sure how I should compile it. I’m haven’t been using Linux that long so I’m not sure about some of the details behind .so files.

I’ve been searching the internet for examples, but thus for I only get errors: ./ undefined symbol: __cudaTextureFetch, ImportError: dynamic module does not define init function (inittiny), stuff like that.

I’m using the 64bit sdk. Basically all i did for the .cu file is take an already working python module I was testing and included cutil.h.

Any ideas on what compiler flags I need to use?


Using CUDA in Python modules should work. Can you post your test code, what command you used to compile, and exactly what the error is?

Here are my compile commands with results:

nvcc -o tiny.o -shared -c -Xcompiler -fPIC -I/usr/local/cuda/NVIDIA_CUDA_SDK/common/inc -L/usr/local/cuda/NVIDIA_CUDA_SDK/lib

gcc -o -shared tiny.o -I. -I/usr/local/cuda/include -I/usr/local/cuda/NVIDIA_CUDA_SDK/common/inc -I -DUNIX -L/usr/local/cuda/lib -L/usr/local/cuda/NVIDIA_CUDA_SDK/lib

in python i get:

import tiny
Traceback (most recent call last):
File “”, line 1, in ?
ImportError: ./ undefined symbol: __cudaTextureFetch

also, i tried

nvcc -o tiny.o -shared -I/usr/local/cuda/NVIDIA_CUDA_SDK/common/inc -L/usr/local/NVIDIA_CUDA_SDK/lib -Xcompiler -fPIC -deviceemu

gcc -o -shared /home/username/voltools/gpgpu/tiny.o -fPIC

and in python i get:

import tiny
Traceback (most recent call last):
File “”, line 1, in ?
ImportError: dynamic module does not define init function (inittiny)

By the way, I am using Python 2.4 (1023 Bytes)

You need to add -lcudart to your linking step. Also, be sure inittiny is extern “C” if using CUDA 1.1 (it defaults to C++ name mangling).

Alright, cool. I got it to compile and i and load the module into python.


Now for the last thing standing in my way; any tips on how to convert a Numpy 3d array into a Cuda 3d array?

I had a way of converting a Numpy 3d array to a c 3d array (in my example code), but it looks a little different with Cuda; although I might be overlooking something.

Again thanks for the help.

Take a look at PyStream ( there is a link on CUDAZone or just Google it).

When running some of my test code and PyStream’s cufft_demo code, i get:

cufft: ERROR: /root/cuda-stuff/sw/gpgpu_rel1.1/cufft/src/, line 106

along with some plan problems.

Could this be a deeper problem?

EDIT: Alright, I updated to the newest drivers: 169.04, and I no longer get the cufft errors; however, I still get a segmentation fault, but I just reran it, so I’ll try debugging it.

You can also take a look at It provides Python bindings for CUDA (no FFTs though, but you might be able to add that) and contains some examples, where .cu files have been used from Python, by first compiling them to .cubin files and then loading kernels contained in the .cubin files. The Python CUDA bindings are LGPL,

if it does not say somewhere in the code - I wrote them.