As the topic says, it’s now available to registered devs, and it adds a lot of stuff. Things I like (well, things I did a lot of work for, which means I can remember them…):
TCC is a per-device property of Tesla cards now, so you can run TCC cards alongside WDDM cards. and one other TCC-related thing I can’t talk about yet, I think…
cuStreamWaitEvent does exactly what it sounds like–inter-stream synchronization without using CPU resources
the driver API has been reworked in major ways to support 64-bit devices
There are a ton of other new things, too. You should probably give it a try! Feel free to tell me that I ruined everything at GTC :)
I assume the malloc() and free() support on the device in CUDA 3.2 is the main prerequisite for C++ new and delete support in a future (maybe even next??) release?
I assume the malloc() and free() support on the device in CUDA 3.2 is the main prerequisite for C++ new and delete support in a future (maybe even next??) release?
a user can now access a subset of GPUs by having RW privileges to /dev/nvidiactl and RW privileges to only a subset of the /dev/nvidia[0…n] rather than having the CUDA driver throw an error if you can’t access any of the nodes; devices that a user doesn’t have permissions to will not be visible to the app (think CUDA_VISIBLE_DEVICES version 2.0)
latency on streamed async copies on GF100+ is much improved
a user can now access a subset of GPUs by having RW privileges to /dev/nvidiactl and RW privileges to only a subset of the /dev/nvidia[0…n] rather than having the CUDA driver throw an error if you can’t access any of the nodes; devices that a user doesn’t have permissions to will not be visible to the app (think CUDA_VISIBLE_DEVICES version 2.0)
latency on streamed async copies on GF100+ is much improved
wow… I’m presenting my master thesis on sparse matrix computations (and other) on CUDA on 22 september (spMV: 34 Gflop/s peak SP, 19 Gflop/s peak DP with gtx 285!)… lol, CUSPARSE…
wow… I’m presenting my master thesis on sparse matrix computations (and other) on CUDA on 22 september (spMV: 34 Gflop/s peak SP, 19 Gflop/s peak DP with gtx 285!)… lol, CUSPARSE…
Ahhm. I tested the new 64bit toolkit and code samples on two different Windows 7 x64 machines, with drivers dev260.61 and 260.63.
On the old 8800 GTS cards (only present to get Nsight running) everything’s fine, but on our 480 GTXs most Nvidia Samples and all of my own programs terminate with “cudaErrorDevicesUnavailable (all CUDA-capable devices are busy or unavailable.)”.
Ahhm. I tested the new 64bit toolkit and code samples on two different Windows 7 x64 machines, with drivers dev260.61 and 260.63.
On the old 8800 GTS cards (only present to get Nsight running) everything’s fine, but on our 480 GTXs most Nvidia Samples and all of my own programs terminate with “cudaErrorDevicesUnavailable (all CUDA-capable devices are busy or unavailable.)”.