UVA in Windows 7? UVA use


I currently have cuda 4.0, OS is WIn 7, 64 bit.

I want to use UVA for transfers between CPU and GPU. Do I need linux for this? Or a fermi architecture card (TCC driver model) will do the miracle?

Do I need GPUdirect?

For cuda 4.0 I found the requirements:

[font=“Lucida Console”][i]Supported Platforms

Whether or not a device supports unified addressing may be queried by calling cuDeviceGetAttribute() with the device attribute CU_DEVICE_ATTRIBUTE_UNIFIED_ADDRESSING.

Unified addressing is automatically enabled in 64-bit processes on devices with compute capability greater than or equal to 2.0.

Unified addressing is not yet supported on Windows Vista or Windows 7 for devices that do not use the TCC driver model.[/i][/font]

Does anyone know what happens with toolkit 4.1?

the available cards are geforce gts 450 and gtx 460

thank you


ok I found it.

UVA only in linux or win7 + tcc driver model.

don’t know about gpu direct, I shifted to linux and got my UVA.

In order to use UVA in Windows7 the TCC driver has to be loaded.

Does UVA then work only for remote desktop or other cluster services?

I have four K10 boards in a tower and I’d like to use the 32 GB of memory
with UVA so all GPUs have roughly equal access to large data objects put into the 32 GB storehold.

I’m using Cuda 4, but will soon upgrade to 5.5, especially if it lets me do this.
I don’t use clustering. Thanks in advance.

No, you don’t need remote desktop… TCC mode will make them raw compute devices.

Yes, you can “pool” the memory of the boards as you say, but remember this will not give you good performance. Device memory throughput is hundreds of gigabytes per second per device. If the memory queries have to go out through PCIe, then you’ll have only ~5 GB/sec interdevice bandwidth in aggregate.

With that giant and likely discouraging caveat, this still may be acceptable for a giant database you only rarely need to access. I do this myself with a Monte Carlo simulation, where a shared ~10GB geometric database is held by the host and served to devices at need where I manually cache the data as needed. The host CPU is idle, and the GPUs themselves fetch what they want.