CUDA Toolkit 3.2 release candidate available to registered developers

tmurray · September 14, 2010, 4:25am

As the topic says, it’s now available to registered devs, and it adds a lot of stuff. Things I like (well, things I did a lot of work for, which means I can remember them…):

TCC is a per-device property of Tesla cards now, so you can run TCC cards alongside WDDM cards. and one other TCC-related thing I can’t talk about yet, I think…
cuStreamWaitEvent does exactly what it sounds like–inter-stream synchronization without using CPU resources
the driver API has been reworked in major ways to support 64-bit devices

There are a ton of other new things, too. You should probably give it a try! Feel free to tell me that I ruined everything at GTC :)

MisterAnderson42 · September 14, 2010, 12:50pm

A couple more highlights from the e-mail

Cool, can’t wait to try this one out.

Whoah, a LOT of people will be happy about that.

You mean that cuda-memcheck didn’t work before on Fermi? Hmm…

MisterAnderson42 · September 14, 2010, 12:50pm

A couple more highlights from the e-mail

Cool, can’t wait to try this one out.

Whoah, a LOT of people will be happy about that.

You mean that cuda-memcheck didn’t work before on Fermi? Hmm…

tmurray · September 14, 2010, 3:32pm

TCC debugging in Nsight was the other TCC feature I referred to. It’s all pretty fancy.

tmurray · September 14, 2010, 3:32pm

TCC debugging in Nsight was the other TCC feature I referred to. It’s all pretty fancy.

seibert · September 14, 2010, 3:34pm

I assume the malloc() and free() support on the device in CUDA 3.2 is the main prerequisite for C++ new and delete support in a future (maybe even next??) release?

seibert · September 14, 2010, 3:34pm

I assume the malloc() and free() support on the device in CUDA 3.2 is the main prerequisite for C++ new and delete support in a future (maybe even next??) release?

avidday · September 14, 2010, 3:48pm

CUSPARSE has my attention…

avidday · September 14, 2010, 3:48pm

CUSPARSE has my attention…

tmurray · September 14, 2010, 4:10pm

Two other little things:

a user can now access a subset of GPUs by having RW privileges to /dev/nvidiactl and RW privileges to only a subset of the /dev/nvidia[0…n] rather than having the CUDA driver throw an error if you can’t access any of the nodes; devices that a user doesn’t have permissions to will not be visible to the app (think CUDA_VISIBLE_DEVICES version 2.0)
latency on streamed async copies on GF100+ is much improved

tmurray · September 14, 2010, 4:10pm

Two other little things:

a user can now access a subset of GPUs by having RW privileges to /dev/nvidiactl and RW privileges to only a subset of the /dev/nvidia[0…n] rather than having the CUDA driver throw an error if you can’t access any of the nodes; devices that a user doesn’t have permissions to will not be visible to the app (think CUDA_VISIBLE_DEVICES version 2.0)
latency on streamed async copies on GF100+ is much improved

samuelmurdoch · September 14, 2010, 6:38pm

wow… I’m presenting my master thesis on sparse matrix computations (and other) on CUDA on 22 september (spMV: 34 Gflop/s peak SP, 19 Gflop/s peak DP with gtx 285!)… lol, CUSPARSE…

samuelmurdoch · September 14, 2010, 6:38pm

wow… I’m presenting my master thesis on sparse matrix computations (and other) on CUDA on 22 september (spMV: 34 Gflop/s peak SP, 19 Gflop/s peak DP with gtx 285!)… lol, CUSPARSE…

indy2718 · September 14, 2010, 7:59pm

Maybe you can try operator overloading new and delete and use malloc …

indy2718 · September 14, 2010, 7:59pm

Maybe you can try operator overloading new and delete and use malloc …

Bad-AS · September 15, 2010, 8:06am

Ahhm. I tested the new 64bit toolkit and code samples on two different Windows 7 x64 machines, with drivers dev260.61 and 260.63.

On the old 8800 GTS cards (only present to get Nsight running) everything’s fine, but on our 480 GTXs most Nvidia Samples and all of my own programs terminate with “cudaErrorDevicesUnavailable (all CUDA-capable devices are busy or unavailable.)”.

Any clue on that?

Bad-AS · September 15, 2010, 8:06am

Ahhm. I tested the new 64bit toolkit and code samples on two different Windows 7 x64 machines, with drivers dev260.61 and 260.63.

On the old 8800 GTS cards (only present to get Nsight running) everything’s fine, but on our 480 GTXs most Nvidia Samples and all of my own programs terminate with “cudaErrorDevicesUnavailable (all CUDA-capable devices are busy or unavailable.)”.

Any clue on that?

NCC-1701D · September 15, 2010, 9:46am

I moved to 3.2 and when I tried to run some of my older codes

I got the following compile error ???

nvcc error : ‘cudafe’ died due to signal 11 (Invalid memory reference)
make: *** [obj/x86_64/release/main.cu_20.o] Error 11

the same code compiles without any issues with CUDA 3.1
anything I am missing out ?

Further, I have no such problems when compiling the sdk examples !

NCC-1701D · September 15, 2010, 9:46am

I moved to 3.2 and when I tried to run some of my older codes

I got the following compile error ???

nvcc error : ‘cudafe’ died due to signal 11 (Invalid memory reference)
make: *** [obj/x86_64/release/main.cu_20.o] Error 11

the same code compiles without any issues with CUDA 3.1
anything I am missing out ?

Further, I have no such problems when compiling the sdk examples !

NCC-1701D · September 15, 2010, 10:01am

another think i observed with my quick tests with 3.2 - big drop in OpenCL performance !!

i think this has to do with the new dev driver ??

anyone else able to confirm this ??

Topic		Replies	Views
CUDA Toolkit and SDK 2.3 released CUDA Programming and Performance	127	320602	November 3, 2009
CUDA 3.2 on GTX 480 is "busy or unavailable" CUDA Programming and Performance	19	73615	March 21, 2011
CUDA Toolkit 3.0 beta released now with public downloads CUDA Programming and Performance	104	430976	March 25, 2010
CUDA Toolkit and SDK v2.2 released CUDA Programming and Performance	59	65219	January 25, 2011
CUDA 3.2 Driver BROKE ? Oops.... CUDA Programming and Performance	20	11561	December 22, 2010
CUDA Toolkit 3.0 released CUDA Programming and Performance	62	26746	September 21, 2010
CUDA Toolkit and Driver 2.3a for OS X released CUDA Programming and Performance	34	35641	October 20, 2009
CUDA, Linux Ubuntu 10.04 and strange mismatch version CUDA Programming and Performance	26	19375	November 18, 2010
CUDA 2.1 beta CUDA Programming and Performance	49	67674	December 3, 2008
CUDA 3.2 RC2 CUDA Programming and Performance	21	3376	November 10, 2010

CUDA Toolkit 3.2 release candidate available to registered developers

Related topics