Separate Compilation and Linking of CUDA C++ Device Code

jwitsoe · April 22, 2014, 3:04pm

Originally published at: https://developer.nvidia.com/blog/separate-compilation-linking-cuda-device-code/

Managing complexity in large programs requires breaking them down into components that are responsible for small, well-defined portions of the overall program. Separate compilation is an integral part of the C and C++ programming languages which allows portions of a program to be compiled into separate objects and then linked together to form an executable or…

anon51415268 · May 5, 2014, 11:49pm

Above "main.cpp" is missing some lines.

anon44464960 · May 7, 2014, 4:45pm

You're right! Thanks for catching that. I added in the missing lines.

anon29516460 · January 14, 2015, 11:49pm

Hi Tony
I am trying to run cuda codes on Amazon AWS. I am using the Kmeans cuda implementation by Serban Giuroiu where different functions are written in different files.

I just need to alter few parameters and run the code with my data. Now, the default object file is running fine. But when I am trying to compile .cu files after changing the parameters. I am getting errors. I can give more details about the errors. nvcc -c command is creating objects but these objects are not running and giving errors.

I have a file cuda_main.cu from which it is calling a function cuda_kmeans() which is written in cuda_kmeans.cu file, where I need to change a few values. Any suggestion how should I compile them using command line ? I'll appreciate any suggestion.

anon95548854 · January 22, 2015, 3:40pm

I assume you're talking about https://github.com/serban/k.... I checked this out and as it is in the repo, the makefile is not setup to use separate compilation units. Make sure to add the -dc option when building the objects . If you're linking with nvcc that should cover it. My suspicion is that you are calling one of the __device__ routines in cuda_kmeans.cu from a kernel in cuda_main.cu. If this is not the problem comment back and I'll help you dig deeper. Thanks!

-Tony Scudiero
NVIDIA

anon29516460 · January 22, 2015, 5:33pm

Thanks Tony for your reply. After changing the code if I 'make' it again, it seems to be working fine.
Thanks agian.

anon84826741 · January 4, 2016, 11:26am

How do you set this build up with Nsight Eclipse edition? I can build with your makefile but not from Eclipse. I posted all the details on SO: http://stackoverflow.com/qu...

anon29743066 · May 23, 2016, 1:43pm

Great. Nice reference for novice of CUDA programming.

anon42309063 · May 23, 2016, 8:47pm

Just for completeness, in the section 'Advanced Usage: Using a Different Linker', the interested reader should perform the following commands:

# object files
nvcc -x cu -arch=sm_20 -I. -dc main.cpp -o main.o
nvcc -x cu -arch=sm_20 -I. -dc particle.cpp -o particle.o
nvcc -x cu -arch=sm_20 -I. -dc v3.cpp -o v3.o

# gpu object file
nvcc -arch=sm_20 -dlink v3.o particle.o main.o -o gpuCode.o

# final link
g++ -L/usr/local/cuda-7.5/lib64 gpuCode.o main.o particle.o v3.o -lcudart -o app

The last command need the -L option to define the location of libcudart.so in order to avoid problems.

anon3211147 · May 25, 2016, 3:45am

I try to run the code in MVS 2013 and it turns out to have unresolved extern function for the advance in the particle, why is that? Thank you.

anon95548854 · June 8, 2016, 12:58pm

That error indicates that you haven't told the CUDA compiler to generate relocatable device code - i.e. linkable device code. In your project property pages, go to CUDA C/C++ ->Common and look for the field "Generate Relocatable Device Code" and use the pull-down menu to set this to "Yes (-rdc=true)." You should then be able to build and run.

anon42508259 · July 4, 2016, 8:16pm

It wasn't evident to me at first, but after reading it again, I wanted to note, for the NVIDIA CUDA developer community, that you can use EITHER __host__ or __device__ decorations (prefixes) before your class method. It was useful in my code that I was working on to use a class from __global__ and I successfully compiled that when the class method had a __device__ decoration (prefix). Also, what I found is that if I wanted, on the device, for the class to "instantiate" or "contain" some arrays, objects, ints (integers) etc., then that "instantiation" method (function) in the class needs to be decorated with __device__ as well. I noted this on my README.md to get the good word out:
https://github.com/ernestya...

Thanks Murphy and Scudiero for the article! And I find Harris' articles and github repo to be super useful as well.

anon95180265 · July 4, 2016, 10:45pm

You can also combine `__host__` and `__device__` on a single method.

anon42508259 · July 4, 2016, 11:16pm

I obtain this warning when I combine __host__ and __device__ when I run my make file (makefile):
physlib/dev_R3grid.cu(8): warning: a __device__ function("dev_Grid3d::dev_Grid3d") redeclared with __host__ __device__, hence treated as a __host__ __device__ function

physlib/dev_R3grid.cu(12): warning: a __device__ function("dev_Grid3d::flatten") redeclared with __host__ __device__, hence treated as a __host__ __device__ function

My program is in this subdirectory, as I implemented finite volume upwind method for convection in 3-dimensions on CUDA C/C++:
https://github.com/ernestya...
and the C++ class that I wanted to run on the device is here:
https://github.com/ernestya...

In general for good programming practice, when obtaining compiler WARNINGS, is it always best to make changes until they go away? They weren't errors and my executable did what I wanted.

* tangent side note 1 - Dr. Harris, as I was interested (passionate) about implementing on the GPU combustion CFD, I was going through, from the graduate school level up, the "standard" ("canonical") computational methods for aerospace engineering (finite difference, finite volume, etc.), and I found most solvers implemented in 2-dims. - why not in 3-dim. since we obviously live in a 3-dim. world?

*tangent side note 2 - @Mark_Harris:disqus I tried increasing from 64 to 92 and above in your blog post about finite difference methods (3-dim.) for the grid size dimension and obtain Segmentation Faults. Why doesn't it scale? I don't think it's a RAM (memory) problem. cf. https://devblogs.nvidia.com...

anon95180265 · July 4, 2016, 11:44pm

Yes, it's important to fix warnings. In this case you need to make the declaration (in .h) and definition (in .cu) of these methods match (__host__ __device__ on both).

Tangent 1: Probably because it's simpler to explain and diagram 2D implementations.

Tangent 2: I will look at the error. Please don't cross-post.

Thanks!

anon42508259 · July 5, 2016, 12:41am

It's my fault : __host__ __device__ works for a single method and you were right @Mark_Harris:disqus , __host__ __device__ has to be BOTH in the declaration and definition.

Thanks for all your help!

anon4387623 · December 16, 2016, 2:38pm

Hi,

I have a question about the program. I have done a change on the number of steps in the main function from 100 to 500000 and the program crashed.

Why the program crash? Is there a kernel launch limit?

I have a Tesla k20m card.

Thank you.

anon95180265 · December 19, 2016, 12:29am

Hi Albert, there shouldn't be any limit that would cause it to crash under that change. I tried it myself on my laptop and it seems to run fine -- I could only wait for about 50,000 iterations but I see nothing in the code that could cause it to crash after more, other than random bit errors corrupting memory. It took several minutes to run 50,000 iterations on my laptop GPU (a few years old).

anon4387623 · December 19, 2016, 1:13am

Hi Mark, thanks for reply. I tried the program in other workstation with different SO and different GPU and seems that the problem is on my Tesla or SO. I'm investigating about this. Thank you!

anon4387623 · December 20, 2016, 9:30am

Hi Mark,

I found the problem, it's a temperature problem... Do know if it is possible buy the heat-sink for the Tesla K20m?

Tank you.

Topic		Replies	Views
NVCC forces c++ compilation of .cu files CUDA Programming and Performance	11	25434	December 11, 2011
Nvcc compiler options on Eclipse CUDA Programming and Performance	12	25903	June 2, 2010
Wishlist Place your considered suggestions here CUDA Programming and Performance	201	204310	April 13, 2009
Unified Memory for CUDA Beginners Technical Blog	46	2471	December 1, 2023
Building Cross-Platform CUDA Applications with CMake Technical Blog	79	3990	October 27, 2021
Eclipse Plugin for CUDA and QT development CUDA Programming and Performance	79	156125	April 7, 2013
Ubuntu 20.04, GCC 9.3, Cuda Toolkit 11.3 - not a supported combination? CUDA Programming and Performance	11	8639	November 4, 2021
Integration with XCODE? Anyone created a CUDA template project? CUDA Programming and Performance	101	577737	August 10, 2014
gcc 4.4 support anytime soon? CUDA Programming and Performance	24	108097	April 9, 2010
New Features in CUDA 7.5 Technical Blog	66	1054	August 10, 2016

Separate Compilation and Linking of CUDA C++ Device Code

Related topics