simplest programming environment (editor) for Cuda?

Hi all,

Actually, perhaps the topic should say simplest programming environment for CUDA (editor that is easy to compile CUDA with).

I’m wanting to mess around with CUDA a bit, and from the looks of the sample code, it seems like modifying some of that code should be easy. I’ve also looked around online and seen several tutorials that even I understand!

I know Java, C, PHP, and a bit of C++. However, i do not consider myself a great programmer by any standards. When i program, i’m always using Dev C++ or Jgrasp for Java. I use their built in compilers, and as such, i never need to worry about various (aka technical) build instructions or command line instructions.

So basically, is there a way i can work with CUDA in a simple manner? Just like Dev C++ is to C or to C_++, is there any programming environment made for CUDA (meaning an easy editor).

I’ve tried to use MSVS2005 with CUDA 2.0 and have received NUMEROUS errors. I’ve never been able to build any program. I tried to search for solutions and edit some of the settings, put perhaps this is beyond me.?.

After installing CUDA, i was ideally sexpecting to be able to simply open up one of the sample programs, “build” it and have it run. I didn’t even get close. But this could be because i have ZERO experience with Visual Studio.

So can anyone help me out here. Supposedly release 2.1 allows one to use MSVS2008. Does this help any? Are things easier?

Is there some tutorial that shows me how to simply compile and then run CUDA programs?

Thanks for all your help.

Mind you, i’m not even experienced with MSVS 2008 either. Just don’t want anyone to think i know that program. I’ve never used it.

But perhaps it, or some other program, is user friendly for compiling and running CUDA programs.?.

Personally, I use Eclipse with the C / C++ toolkit … it’s only so-so for building, but for writing CUDA programs it’s quite nice. It’s good at understanding sources and figuring out what comes from which header and jumping around in code etc.

Textmate works fine on a Mac.

I used to have Windows XP Pro x64, with the older 2.0 release of CUDA, and I had tons of errors with VS2005. Since then, I’ve switched to Vista Business x64, CUDA 2.1, and VS2008, and haven’t had any problems. Just make sure after you install the toolkit and the SDK, that you do the following:

  • Open “My Computer” (or explorer, or whatever) and navigate to C:\ProgramData\NVIDIA Corporation\NVIDIA CUDA SDK\common
  • Double-click on cutil_vc90.sln (assuming you’re using CUDA 2.1 and VS2008); when the solution loads up, you will see a drop-down menu for the build configuration. If you are on a 64-bit platform, you need to change this from Win32 to x64.
  • Build the solution.
  • Look up top again – you should see the build configuration menu that says “Debug”. Change it to “Release” and build the solution again.
  • Close Visual Studio.
  • Back in the explorer window, find the file “paramgl_vc90.sln”. Double-click it to open that solution.
  • Repeat the same configuration setup and build process as described above, then close Visual Studio.

At this point, you should be able to compile the SDK projects; if you are using VS2008, make sure you open the solutions ending in _vc90.sln. Again, if you’re on x64, you need to also make sure to set the build platform to “x64” in that drop-down menu.

If you get this far, and you’re ready to write your own projects, check out the “template” project that comes with the SDK. You should be able to make a copy of that and use it for your own stuff, with the compiler settings (for CUDA, that is) already set up.

Alright! Seems like we’re getting somewhere now.

Let me add some details. I’m using 32 bit Vista Ultimate. I have not installed either MSVS2008 or CUDA 2.1. But i’m guessing from your response (above) that you feel i should be okay doing this.?.

Building and running should be straightforward.?.

If so, i can download them and install tomorrow.

Also, is there any reason why a person cannot simply use notepad and then build/run from the command line? Sorry if this is a stupid question. But in the same way that i can compile java programs from the command line, can i compile CUDA programs this way and simply use notepad as my editor?

If so, why all the fuss with MSVS? Is it SOOO amazing or so helpful? Again, i’ve NEVER used it at all…so i’m unaware of the benefits.


IDEs have 1) Autocompletion, 2) syntax coloring, 3) live syntax error detection, 4) source navigation … etc. It’s a matter of convenience, but so great convenience that it is comparable to living in a house vs. living in a tent.

I imagine you’ll be fine. You can also skip the part about building the x64 binaries, since those won’t apply to you. Install MSVS2008, the CUDA 2.1 driver, the CUDA Toolkit, and the CUDA SDK (in that order), then build the projects as directed above, and that should be it.

The simpliest is notepad under Linux + the command line :

nvcc -o executable :yes:

Okay. Good deal. I’m now running MSVS2008 (did not load SP1 yet), and loaded in the order you specified. It WORKS!

I can finally simply open up a 2008 MSVS Solution file (bandwidth solution, for example), and it builds properly. Is there a way to run it within MSVS2008? I could not find that, so i simply navigated to the folder (via My Computer) and double clicked the newly generated executable file.

But bottom line, i am now able to proceed.

Now on to tutorials…

Previously, I’ve seen and read through the following tutorial at Dr. Dobbs:

It’s pretty basic but goes through the process nicely. Now that i’m up and finally able to build these predone solutions, i was expecting to be able to simply make a new BLANK file in MSVS and then cut and paste the first CU program, shown on that website, into this CU file. I thought i could do this, and then build.?.

Apparently not.?.

Again, i have zero knowledge of MSVS, and no doubt, that is one of the many bottlenecks here.

I’m guessing i need to make a new project file, but when i do so, it asks for what project type (Visual C, C#, C++, etc), and upon selecting one of them, it loads up a project with several header files, several src files, etc. I’m guessing this is the wrong direction to go in?

So then i turn to the files given in the CUDA SDK, specifically the template file. So i load up that solution, and it thankfully only has the CU file as part of the solution…nothing else.

BUT, there’s a good bit of “stuff” already in that CU file.

So, bottom line, how do i make this happen. I want to run through the examples on the tutorial website (given above). The first example makes PERFECT sense to me. But how do i make a new file/project and run this thing in MSVS?

Thanks for your patience.

Let me add, that I’ve tried to make a “New -> File” and then i simply choose text file from the options. I then paste the aforementioned tutorial code into this blank document, and then i save as So it is NOT text file.

However, how do i then compile and run this cu file within MSVS?


In VS2008, either hit the “Play” button (a green arrow in the toolbar) or go to the Debug menu, then Debug. Or, just to run it, go to the Debug menu and then Start Without Debugging. If you want to run it on the actual GPU, make sure that your build mode is not set to one of the “Emu____” options (up in the toolbar).

Basically, you’re going to want to copy the template project folder out of the SDK, and then open the copy in VS. The reason being, that (obviously) by default, VS2008 uses it’s own compiler. To compile a CUDA project, there are extra build steps (use the CUDA compiler for the .cu files), which need to be set up in each project you create – so to save time, it’s much faster to use the template project. Just make sure that you take out any headers/functions which are not necessary to run your code. Then, you can add extra .cu files, or whatever to the project as needed.

If you don’t mind, what are these steps. If i made a new solution in MSVS, and then if i added a new, CU file to this solution, such as the file mentioned earlier from the Dr. Dobbs tutorial, how do i then tell MSVS to use the CUDA compiler. Cause you are right, when i try to Build, it doesn’t understand how to compile the CU file.

So if i use the template project, do i NEED all three files. I’m sure you are familiar with it and the fact that the template project has, template_gold.cpp, and

Do i need all three of those files?

I just “removed” the template_gold.cpp and and it, as you probably alread know, did not Build properly. Apparently, template_gold.cpp is referenced within the file. So i added that one back and the Rebuild worked.

So do i not need the file? What is that for?

I just want to run through these Dr. Dobbs examples in the tutorial. Are you saying that i need to basically take his first CU program, cut it, and then paste it into the file?

Is it perhaps easier to just change whatever options are necessary to get MSVS to compile using CUDA?

If you open an SDK project in MSVS then open up the properties for the .cu files there is a Custom Build Step option which has things like Command Line, Description, etc. These are the instructions that tell MSVS to use the nvcc compiler instead of the standard compiler to build the CUDA code. So any .cu file you add will need this information.

Also in the project properties under C++ - Additional Include Directories you should see (CUDA_INC_PATH) and under Linker - Additional Library Dependencies you should see (CUDA_LIB_PATH).

The template project you describe is similar to most of the SDK projects. One file to drive the application, one (kernel) to hold the CUDA kernel code and one (gold) to hold the equivalent operations in CPU C++ code for comparison. But in a regular project then the only code that has to be in a .cu file is the kernel code and the kernel invocation. Anything else (included the CUDA runtime API calls) can be in a C++ file. A minimal CUDA project can just be a single .cu file with main() and the kernel definition in the same file.

notepad exists for linux, as well? Hmm…

Okay, now i’m really trying to make sense of this.

I’m going through this trivial example at Dr. Dobbs:

And this got me thinking, how does one know what is the largest they can make the blockSize and nBlocks to? I see some of these trivial examples try to dynamically calculate this with the aforementioned variable names. But you can obviously run into problems if you end up specifiying too many blocks or too large of a block size.

Do I understand correctly that this is hardware dependent? Turning to A.1.1 of the CUDA Programmers Guide, I see that for my GPU, the maximum # of threads per block (blockSize) is 512. And then I see that the maximum # of active blocks per multiprocessor is 8. But there is also an overall limitation on the number of active threads per multiprocessor: 768.

So, if the GPU is cheaper (or older) and the # of MPs is only 4, that means the max # of active threads would be 768*4 = 3072.

Okay, I had just spent a while typing out an example here, to hopefully illustrate my confusion and to raise a question. But after twenty minutes, even it didn’t make sense…so it’s gone.

Bottom line, how do I know how large of values I can put in the execution configuration of the Kernel call? I believe it is correct to say that the absolute maximum value possible for the blockSize (# threads/block) is based on hardware.?. Correct?

So how many blocks can I use? The only thing in A.1.1 I see is regarding the number of active blocks. So what value (or max value) can I use for nBlocks? Is there a maximum amount? I know that there are only a max of 8 active blocks per MP (for my GPU). So can i specify more than that? Are those other blocks just not active then?

So what if I have an array of size 100,000,000, and for the sake of a trivial example, i want to square each value of this array.

Does this mean that I cannot invoke the Kernel with the following:


int blockSize = 512;

int nBlocks = N/blockSize + (N%blockSize == 0?0:1);

// Thus giving me over 195,000 blocks.  Is that possible?

incrementArrayOnDevice <<< nBlocks, blockSize >>> (a_d, N);

//Here a_d is simply the array on the device.

There are a lot of questions here. Hopefully you can clear some of this up.


I figured since you are all helping me out here, the least I can do is play with some of my questions.

So I just took that Dr. Dobbs example given on the previous link, and I confirmed that the 512 for blockSize is indeed the max I can use, as this seems to be strictly hardware dependent…I can’t request to use > 512 threads/block when my GPU can handle a max of 512 threads/block. So that part makes sense.

Then I started playing with the value of N. I changed it to 100,000. That comes to 196 blocks. No problems. Considering the max # of active blocks is 8 per MP, and my cheaper GPU has only 4 MPs, this means a max of only 32 active blocks. Yet, i can call the Kernel with 196 blocks?

So what is the limitation?

I then changed my N to 100 Million, thus resulting in 196,000 blocks (approx.). This caused some errors. So again, how do i know the max. # of blocks i can request? Is it from the following line in A.1.1: “The maximum size of each dimension of a grid of thread blocks is 65535.”

And lastly, is there an optimum # of threads to request per block? By searching the forum, i’ve come across people giving the number 64 ( Is there a rhyme or reason to this? This person also said, if I read correctly, that 256 is an optimal # of threads per MP.?. Does that make sense?

And is there a more effective # of blocks to request, or is that simply based on how large the array is (in this trivial example)?

Again, thanks for the help.