CUDA 2.1 beta

Debugger (32-bit Linux only)



Fedora8 32-bit:
Fedora8 64-bit:
Fedora9 32-bit:
Fedora9 64-bit:
RHEL4.7 32-bit:
RHEL4.7 64-bit:
RHEL5.2 32-bit:
RHEL5.2 64-bit:
SLED10-SP2 32-bit:
SLED10-SP2 64-bit:
SUSE10.3 32-bit:
SUSE10.3 64-bit:
SUSE11.0 32-bit:
SUSE11.0 64-bit:
Ubuntu7.10 32-bit:
Ubuntu7.10 64-bit:
Ubuntu8.04 32-bit:
Ubuntu8.04 64-bit:
Windows 32-bit: CudaSetup-win32-rel-nightly-2.1.1635-3046817.exe
Windows 64-bit: CudaSetup-win64-rel-nightly-2.1.1635-3046817.exe

try it~~~ :thumbup:

Thanks for the updates!
Can’t wait to use the debugger.

I know the debugger manual says that 32 bit OS is needed, but just wondering if 64 bit OS and 64bit driver can be used while debugging a 32 bit cuda application.

Just a quick question because tomorrow I will have time to do some work on my linux cuda machine. Will debugging in linux 64bit be supported in 2.1 final? Otherwise I will install a 32bit windows next to my 64bit version.

Other than that: hooray!! (although it would have been easy if the linux distribution would have been visible without looking at the link location ;))

2.1 release will not add the debugger for 64bit linux. Also, note that the debugger is supported on 32-bit linux (you mentioned installing 32bit windows, which does not have the debugger).


Device ordering has changed. Assume nothing about device ordering.

(“watchdog timer enabled” will be an option you can query for in 2.1 final, but didn’t make it into the beta. sorry guys who care a lot about the watchdog timer :( )

Hmm, I indeed meant linux32 :) Okay, good to know, then I can use the downtime of the machine to install a 32bit version next to it.



I can’t help but notice there’s no release notes for this beta? (eg: what’s changed, why should I even try this beta?)

I’m assuming they’re in the toolkit, but I have no reason to download the toolkit without knowing what’s changed ;)

Tried OpenSUSE 10.3 toolkit on an (unsupported) 10.2 box, and it does not work (some weird crashes that do not reproduce in gdb). But as the opensuse folks discontinue the 10.2 release on December 1st anyway, upgrading makes an awful lot of sense. Just for the record, in case others want to try this.

Ah, finally!!

I can’t wait to see if I can use my secondary GPU. :thumbup:

YEEEEEEEEEEEEEEEEEEEEEES!!! Found 2 compatible devices!!

Thank you NVIDIA!!!

:) Can we compile to Multi-core architecture with this version ?


I was also trying to look at it, but doesn’t seem possible (nvcc can’t recognize --multicore, nor it features any similar argument).

Also, when compiling .cu files with nvcc -g -G - required by cuda-gdb - I get the following errors (caused by -G):

./liblissom.a(retinaLGN.o): In function `$$SymbolTable’:

(.nv11Segment+0x498): undefined reference to `$gpu_registers’

./liblissom.a(retinaLGN.o): In function `$$SymbolTable’:

(.nv11Segment+0x900): undefined reference to `blockDim’

./liblissom.a(retinaLGN.o): In function `$$SymbolTable’:

(.nv11Segment+0x90c): undefined reference to `gridDim’

./liblissom.a(retinaLGN.o): In function `$$SymbolTable’:

(.nv11Segment+0x918): undefined reference to `blockIdx’

./liblissom.a(retinaLGN.o): In function `$$SymbolTable’:

(.nv11Segment+0x924): undefined reference to `threadIdx’

./liblissom.a(CUDALISSOM.o): In function `$$SymbolTable’:

(.nv11Segment+0x910): undefined reference to `$gpu_registers’

./liblissom.a(CUDALISSOM.o): In function `$$SymbolTable’:

(.nv11Segment+0x19a8): undefined reference to `blockDim’

./liblissom.a(CUDALISSOM.o): In function `$$SymbolTable’:

(.nv11Segment+0x19b4): undefined reference to `gridDim’

./liblissom.a(CUDALISSOM.o): In function `$$SymbolTable’:

(.nv11Segment+0x19c0): undefined reference to `blockIdx’

./liblissom.a(CUDALISSOM.o): In function `$$SymbolTable’:

(.nv11Segment+0x19cc): undefined reference to `threadIdx’

collect2: ld returned 1 exit status

When I link .cu objects with a c++ application.

Is this a known problem?

I am getting the following error when running CUDA from within matlab.

Cuda error: after setdevice
in file ‘’ in line 257 : setting the device when a process is active is not allowed.

I have for now disabled the setting of the device because my fastest device is now device 0, but this looks like an error.

__threadfence() and __threadfence_block() are mentioned in the release notes, but I can’t find anything in the documentation on them. What do they do?

Any release notes, changelogs and such?

I’m not sure if __threadfence() actually made it into this version or not, but it’s basically a global synchronization barrier with all the caveats of doing global synchronization some other way (e.g., if every thread isn’t running, congratulations, you just deadlocked the card). (note: this isn’t really accurate, see paulius’ description later in the thread)

E.D.: cudaSetDevice(n) was changed in this version to throw an error if you’re calling cudaSetDevice after a context has been created. Before, it would succeed but not really do anything.

Install CudaSetup. In doc directory you can find release notes and revision history:

Thanks! I didn’t install anything as I don’t want to mess up my current install.

Those new features look nice, let’s hope the release version follows soon enough :)

Ah, that is nice to have if you are doing some kind of fixed size processing, where you know for sure that everything will be running at the same time. It might be possible to make things a bit more optimal that way. (with a big message about hardware requirements ;))

Hmm, the code I have there is copied from an SDK example I believe:

[codebox]void mexFunction(int nlhs, mxArray *plhs, int nrhs, const mxArray *prhs)


int dev, useDev=0, deviceCount;


for (dev = 0; dev < deviceCount; dev++)


    cudaDeviceProp deviceProp;

    cudaGetDeviceProperties(&deviceProp, dev);

    if (dev == 0) {

        if (deviceProp.major < 1)

            mexPrintf("There is no device supporting CUDA.\n");

        else if (deviceCount == 1)

            mexPrintf("There is 1 device supporting CUDA\n");


            mexPrintf("There are %d devices supporting CUDA\n", deviceCount);


    if (deviceProp.minor == 3) {




// USE the GT200

cudaSetDevice(useDev); // This is the cudaSetDevice call that gives an error.

cudaDeviceProp deviceProp;

cudaGetDeviceProperties(&deviceProp, useDev);

mexPrintf("\nDevice Selected %d: \"%s\"\n", useDev,;


I cannot see where I am creating a context, and I do not know how else to make sure I am using the device I want to use. Luckily the GT200 is device 0 with the new ordering I get in 2.1 so I am safe at this time…