The simplest way to compile a .cu file (looking for "hello world" example for compilation


I’ve recently started to work with CUDA, and I surprised about the difficulty on finding some kind of “getting started” information about the compilation process.

Apparently, from what I take from the online material and contributions to different forums, the most general trend seems to be to adapt some makefile from the SDK.

I’m not certain about why building a makefile should be necessary. In other words: if I have a small .cu code, what is wrong about compiling it with a simple call to

nvcc -o EXECUTABLE_NAME “…/”-I/usr/local/cuda/include -lcudart -L/usr/local/cuda/lib

I guess there must be some kind of problem with this way, as my programs do work, but seem to be rather underperforming… I guess there is a minimum set of flags that I need in order to ensure adaption to the local architecture.

Any suggestion is very welcome…

It isn’t - you don’t need a makefile for trivial compilation.

Nothing. it can be simpler than that:

nvcc -o executable

will build an executable that will run on any CUDA compatible card, as long as doesn’t have any external dependencies outside of CUDA or the standard C/C++ library.

I doubt that has anything to do with compilation.

There isn’t. Almost without exception, nvcc compilation flags only turn on specific architectural features at the PTX generation stage (things like double precision, atomic memory operations, C++ runtime support, in kernel printf support, etc). If you don’t use those features, the PTX code produced will be almost identical. If you try and use those features without the correct flags, the compiler will generate warnings or errors. nvcc uses very aggressive optimization settings during C compilation, and the PTX assembler and driver have a lot of internal architecture specific optimisations over which there is basically no programmer control. About the only real compilation options that can effect performance are floating point compliance settings (ie. whether to use exact or fast versions of some math library functions and operands), and register usage limits. Both are discussed in some detail in the programming guide.

Hi avidday,

thanks! That really helped… now I guess I can come back to the programming guide with a clearer idea of what to look for.

Actually my problem was that interruption of coda execution at runtime seems to lead CUDA to fail to release memory at the GPU. As I had read that this feature should be solved for newest versions of CUDA (and I work with 3.2), I thought this behavior was to be corrected using some special compilation flag.