Porting to the GPU Any Easy way to Port Code

I am trying to port some code for solid state physics onto the GPU. The program is currently set up with a CMakeLists.txt file, where you make in a build directory, there is an include directory, and a source directory. The modifications I’m making, i.e. the kernel call and cuda memory allocations etc, are in a relatively small part of the code. Does anybody have an example Makefile or a link to an example make file that shows how to declare the paths to the .cu, .cpp, and .h parts of your program and link to the NVCC compiler etc? I find all the programs that come with the toolkit are a bit too simple and small to really get a feeling for how to make a CUDA based makefile. I am posting the makefile and if anybody could be kindly enough to suggest what the easiest way to change this file to include .cu files would be I would be very appreciative.
CMakeLists.txt (7.46 KB)

Check about FindCUDA.cmake module - it’s available in upcoming CMake 2.8 (RC builds here), and it will probably do for of all your needs.

The easiest way to incorporate cu file compilation into an existing CMake build is to use the FindCUDA.cmake build scripts. FindCUDA will be included in the next version of CMake (2.8), and you can find a release candidate on their website (http://www.cmake.org/files/v2.8/). After installing CMake you would edit your CMakeLists.txt file:

find_package(CUDA REQUIRED)

Change add_library or add_executable to cuda_add_library or cuda_add_executable:

cuda_add_library(algebra SHARED













See the documentation for FindCUDA for more information.

Thanks for the speedy replies. I’ll checkout the FindCUDA.cmake release. I guess using that means tinkering with the Makefile that the CUDA SDK gives you probably isn’t worth the effort and using this FindCUDA.cmake would be more profitable?


I think since you already have a CMake build, using FindCUDA.cmake will be the quickest way to get to a solution.

I’ve downloaded make2cmake.cmake, parse_cubin.cmake, FindCUDA.cmake, and run_nvcc.cmake from this following link:
since i couldn’t successfully complete an svn checkout from the university of utah link. I altered my CMakeLists.txt file to include find_package(CUDA REQUIRED) and switched to cuda_add_executable etc…

I have put all these files in a project/Util directory which also contains FindBLAS, FindLAPACK.cmake modules. I am also quite sure my CMakeLists is linking to it as the CMAKE_MODULE_PATH however when I cmake … from my build directory I get the following message:

– Found CUDA: /usr/local/cuda
CMake Error at Util/FindCUDA.cmake:275 (message):
parse_cubin.cmake not found in CMAKE_MODULE_PATH
Call Stack (most recent call first):
Util/FindCUDA.cmake:685 (cuda_find_helper_file)
CMakeLists.txt:48 (find_package)

It’s picking up the standard cuda install path but not the other .cmake files… I’m quite sure parse_cubin.cmake is in the directory and named properly.

My version of cmake is cmake version 2.6-patch 4.
Is this something very basic that I am missing?

Thanks for any help.

Yes. Please use the CMake release candidate of 2.8, not 2.6. It has the most up to date version of FindCUDA as part of it. I have no idea what state the FindCUDA code in that repository you pulled from is in.

The version maintained by the author, me, is in the CMake cvs repository, though I try to keep the version in the University of Utah’s svn repository relatively up to date. I’m not sure what problems you were having. This should work:

svn co https://code.sci.utah.edu/svn/findcuda/trunk/CMake/cuda FindCUDA

Also, the location of the helper scripts (parse_cubin.cmake and friends) changed from being next to FindCUDA.cmake to being a subdirectory called FindCUDA to accomodate inclusion in the CMake repository, but looking at the FindCUDA.cmake script on that website that isn’t the case, and it should just have found them.

Please try the latest RC of CMake found http://www.cmake.org/files/v2.8/ and let me know if you have problems.

Great! I downloaded the rc3 cmake file from that website and installed it. When I cmake my project with the cuda_add_target and .cu files everything compiled very handily. The only minor glitch was I had to manually change the CMakeCache.txt file so that it pointed to the /usr/local/cuda/lib64 libraries instead of the 32 bit …/lib/ libraries. Not sure why that was. This is a great script. Thanks again.

woops got a little ahead of myself. Looks like i need it to read the stubs-64.h instead of -32. Is this commonly encountered.
In file included from /usr/include/features.h:376,
from /usr/local/cuda/bin/…/include/host_config.h:68,
from /usr/local/cuda/bin/…/include/cuda_runtime.h:45,
from :0:
/usr/include/gnu/stubs.h:7:27: error: gnu/stubs-32.h: No such file or directory
CMake Error at CMakeFiles/exciton09_generated_kernel.cu.o.cmake:237 (message):

I think you somehow configured a 32 bit build and are having problems cross compiling.

Check all your environment variables for ‘-m32’ particularly in CFLAGS and CXXFLAGS. This would explain why FindCUDA picked up the 32 bit libraries.

One way to verify this is to add the following line to your CMakeLists.txt:


It should say 4 for 32 and 8 for 64 bit builds.

well I redid the cmake and forced it to acknowledge the 64 bits by switch the FindCUDA.cmake here


I still had to change a few libraries in the cache but otherwise everything seemed to go smoothly.
Now it looks as though its willing to compile and is fetching the correct libraries however I’m still getting these errors as I try to run my first kernel.

[ 14%] Building NVCC (Device) object ./exciton09_generated_ATOM_SCF1.cu.o
/usr/lib/gcc/x86_64-redhat-linux/4.4.0/…/…/…/…/include/c++/4.4.0/ext/atomicity.h(46): error: identifier “__sync_fetch_and_add” is undefined

/usr/lib/gcc/x86_64-redhat-linux/4.4.0/…/…/…/…/include/c++/4.4.0/ext/atomicity.h(50): error: identifier “__sync_fetch_and_add” is undefined

i think this is in Fedora 11 gcc 4.4 territory now and may do to start a whole new thread unless this is findcuda related/ you have any ideas.


This will cause problems if you are trying to build 64 bit device code, but CMake thinks it is making a 32 bit application.

Things to help with errors like this.

  1. Run make with VERBOSE=1: ‘make VERBOSE=1’. This will show you all the nvcc commands. Find the one that that says “Generating ./exciton09_generated_ATOM_SCF1.cu.o” (there will be another invocation of nvcc proceeding this that says it is generating the dependency file).

  2. Try and copy the command and run it from the same build directory and reproduce the error messages.

  3. Start to strip out more and more flags to nvcc to see if you can get it to build. These will be flags that come with -Xcompiler. These are flags passed to the host compiler (gcc) through nvcc.

If that doesn’t work, you can try and set CUDA_PROPAGATE_HOST_FLAGS to OFF within your CMakeLists.txt file or though ccmake (or the GUI). This will stop passing gcc flags through nvcc.

You can also try toggling the CUDA_HOST_COMPILATION_CPP flag ON and OFF to see if that has any affect.

Well, I’ve managed to get around that error message by being very careful to not have the STL and other various library calls included in my .cu files, that is to say I have separated as much cuda code from c++ as possible and the compiler seems less likely to complain. Thanks for the help.

Maybe there is one more thing… Sorry I’m new to Cmake and Cuda so learning both simultaneously has added slight complications.
My SDK is in a non-standard location. What commands should I add to my CMakeLists.txt file to make is pick up the CUDA cutil.h libraries. I am particularly interested in getting
CUDA_SAFE_CALL working. I’ve posted my make cache and make lists file if those help you get an idea.
Thanks again I hope you’re on the NVIDIA pay roll.
CMakeLists.txt (7.78 KB)
CMakeCache.txt (19.5 KB)

I am on the NVIDIA payroll. :)

FindCUDA doesn’t support finding stuff from the SDK. It’s mainly for using the CUDA C toolkit.

That being said FindCUDA should find the root path of the SDK if it is in a default installation location. You can use this path to help find whatever you need from the SDK. It won’t help of you installed it into a non-standard location, though.

CUDA_SDK_ROOT_DIR should be a cache variable you can set via ccmake or while running cmake for the first time (cmake -DCUDA_SDK_ROOT_DIR:PATH=/GPU Computing/C). Once you have a reasonable value for this, you can use it to find other stuff in the SDK. There are examples of how to do this in FindCUDA.cmake (search for CUDA_SDK_ROOT_DIR).

Here’s a snippet:




  PATH_SUFFIXES "common/inc"

  DOC "Location of cutil.h"



# Now search system paths

find_path(CUDA_CUT_INCLUDE_DIR cutil.h DOC "Location of cutil.h")


Ok. I wanted to use the safe call because I’m having a problem with executing my compiled binary. I’ve isolated the code and it runs and compiles fine in the context of the cuda SDK. I’ve also run the code fine in device-emulation mode. It is just in device mode that the whole execution hangs when I get to my first cudaMalloc.

Is the fact that I’m not building to the typical ~/GPU_COMPUTING/C/bin/linux/release directory likely to cause such a problem?

The code compiles fine but upon execution it gets to the cudaMalloc call and just hangs with ‘top’ showing the cpu running at 100% apparently waiting for the CUDA call to finish.
It hangs until I ctrl-c to kill it. Have you ever encountered this?

I’ve not seen that particular problem. If you are worried about not building in the SDK, then you should try running "ldd " and make sure that all the right libraries are being loaded. Outside of getting the wrong library at run time, I can’t see the problem using a different build location.

You might want to start a different thread about this issue. It seems as though you got to a building state.

You might also want to be sure to check the return codes from all of the cuda calls, just to be sure that the error isn’t happening somewhere else.

I think everything has built and it’s into code/debug mode. I noticed when I compiled in verbose mode that nvcc was getting passed ,"-fPIC" in that form. Is this right? CUDA_PROPAGATE_HOSTS is set to ON in my CMakeLists.txt.

– Generating /home/lamberh/NVIDIA_GPU_Computing_SDK/C/src/exciton09/build/./exciton09_generated_ATOM_SCF_kernel.cu.o
/usr/local/cuda/bin/nvcc /home/lamberh/NVIDIA_GPU_Computing_SDK/C/src/exciton09/source/ATOM_SCF_kernel.cu -arch sm_13 --compiler-options -fno-inline --device-emulation -D_DEVICEEMU -g -m64 -Xcompiler ,"-fPIC" -DNVCC -c -o /home/lamberh/NVIDIA_GPU_Computing_SDK/C/src/exciton09/build/./exciton09_generated_ATOM_SCF_kernel.cu.o -I/home/lamberh/NVIDIA_GPU_Computing_SDK/C/common/inc -I/usr/local/cuda/include -I/home/lamberh/NVIDIA_GPU_Computing_SDK/C/src/exciton09/include -I/include -I/usr/local/cuda/include

As for the cudaMalloc hanging it looks like a pointer issue and I’ll resolve it else where. (I may have more threads than a gpu soon.)

Thanks again for all the help.

You’re welcome for the assistance.

Yes, it’s passing in -fPIC to the host compiler in that form. I believe you are referring to all the quotes and things. I always put the quotes around the host flags, so that arguments with spaces get handled properly. It’s easier to just add the quotes than to try and detect the cases where you need them.