Calling compiled CUDA files from Python

Based on this tutorial, it seems to be working perfectly… on UNIX.

http://bikulov.org/blog/2013/10/01/using-cuda-c-plus-plus-functions-in-python-via-star-dot-so-and-ctypes/

But I am on Windows and after compiling a .cu file into a .so or a .dll or .windll file, it seems uncallable from Python. It definitely registered as a WinDLL object as it gave me a handle and address, but the functioins inside the compiled file aren’t calling.

mydll
<WinDLL ‘C:…kernel.so’, handle 7ffb07040000 at 0x19a5380c3c8>

I can’t call its functions nor find how to utilize it after scraping the internet and going through the python ctypes documentation several hundred times. What can be done?

any help is appreciated :)

PS I tried using both extern “C” { … } as well as extern “C” cudamain{ … } and other function names, to no avail.

before tackling python via ctypes, I would suggest getting a windows DLL project working correctly in visual studio. There are plenty of examples on the internet of how to create a windows DLL that contains CUDA code. Once you’ve got that syntax figured out, it will demonstrate that the embedded functions in the DLL are visible/callable, and my guess is you will just sail through the python/ctypes at that point.

A quick google search of “cuda dll” turned this up as one of the top hits:

https://stackoverflow.com/questions/21956127/creating-cuda-dll-and-using-it-on-vc-project

which seems to give a fairly complete example.

Thanks for the suggestion, I’ve compiled a few projects on VS from CUDA samples, some of which contain ‘extern “C”’ in the .cpp file and still had no success accessing the .dll files from Python ctypes,

found that compiling the .ptx file contains the function names inside of it, and calling those didn’t work, even when adding the “shared” option to the compilation in settings.

Is this a windows limitation? because nvcc doesn’t have the “-fPIC” option on windows like it has on mac… at this point I’ve given up and figured I’ll just use C++ to command the cuda files instead of Python, but it’s a bit more of a hassle.

current VS compilation options:

  • export to .dll
  • shared
  • export .ptx file

is there something I’m missing??

Suggestion:

  1. Learn how to compile and use a windows DLL containing CUDA C++ code, using an ordinary C (non-cuda) interface into the DLL. There are examples of this all over the web. Demonstrate that you can get this working.

  2. Learn how to compile and use a windows DLL (no CUDA) using the Ctypes interface from python. This has nothing to do with CUDA. Demonstrate that you can get this working.

You shouldn’t have to export or mess with PTX to do any of this.

If you can get those 2 things working, it should be a relatively straightforward matter to get the combination working.

Thanks a lot, I finally got it working! Turns out my .cu file was missing __declspec(dllexport).

For anyone looking at this in the future, you can put this on top of your cuda code:

#define DLLEXPORT extern “C” __declspec(dllexport)

and put DLLEXPORT above every function you want to be accessible from Python ctypes. Just make sure you compile it to a shared library (use flag -shared if using nvcc on command line)

Also along the way I learned that you can circumvent the entire Visual Studio process, with a few lines in Python! Just write this:

import subprocess

nvcc_options_dll = [‘nvcc’,’-shared’,r"C:\some_path\kernel.cu", ‘-O3’, ‘–ftz=true’, ‘–fmad=true’, ‘-arch=sm_61’, ‘-o’, r"C:\some_path\cresult.dll"] # some of these are specific to my machine, like compute capability 6.1 since i have a 1050ti

compile_cuda = subprocess.Popen(
nvcc_options_dll, stdin=subprocess.PIPE,
stdout=subprocess.PIPE, stderr=subprocess.PIPE)
out, err = compile_cuda.communicate()

and you will find a .dll in your directory, from where you can proceed with the first tutorial. I found this handy trick digging inside the source code for cuda4py, an amazing module.

Cheers, and happy CUDAing!