CUDA 4.1 RC2 is now available

New compiler front-end (llvm) , new profiler (nvvp) and more.

This is the first public release.

Some feedback on the release:
I’m seeing a 40% speed increase over the old devdriver and CUDA 4.0 toolkit. Much higher than the 10% I was being hopeful for. Many thanks :)

I got “out of memory error” WTF?¿ in my app with Ubuntu 11.10-x86-64, 285.05.23 and CUDA 4.1 RC2 :(
It works great on 285.05.09 and 4.0 release, and all previous releases.
I can’t wait to check the speedup increase!! I’ve got a paper about to submit and wanna try this new release.
All previous issues I had were related to driver problems :S

So anyone knows how to use the llvm compiler with Visual Studio? I am guessing that nvcc isn’t the new compiler? Is it the cicc.exe binary?

Nothing has changed in the actual user-level usage of the compiler with CUDA 4.1, so integration into Visual Studio should work just like before.
nvcc constitutes the official user interface and serves as the driver program which invokes the various components of the compiler. So instead of opencc (the Open64-based frontend) it now calls cicc (the LLVM-based frontend).

OK, thanks for the clarification. I guess I was just puzzled since I didn’t see any apparent change in compilation time, and couldn’t see any change in the formatting of the output text from the compiler, but I didn’t check it with compiler errors (I was looking for the nice text based LLVM compiler errors).

Is this the one include the source code of cuda LLVM compiler?

Or we need to wait for the release version of 4.1 instead of candidate.


The source code will not be available in the general release.
If you are interested, you should fill this form:

On my end, something that perhaps was more or less legal seems to not work at all anymore.
My project is quite large, spanning multiples files for host code and multiple files for device code. All of these files however share resources such as textures and constant memory. I have these items declared in a single file, that is #included by every other files that need it. I am not all that knowledgeable when it comes to compilation mechanics but it would seem that this put them all in the same file scope(?).

The way I compile the project (windows 7 64 bits) is to only compile a, where every other .cu file is #included within that This used to get rid of the fact that I needed to have textures declared at the file scope. However, this does not seem to work anymore (might be as old as a cuda 4.0 issue, it works in cuda 3.2). Now, when I bind, I get an “invalid texture reference” error.

Is that something that is no longer working on purpose?


Got it, appreciate.

Is this changes apply only for sm2x? I did not notice any difference for sm1x in ptx code.

Correct, compilation for sm_1x still goes through the Open64-based compiler, while compilation for sm_2x and up goes through the LLVM-based compiler.