When I try to run a program compiled with the options -ta=nvidia I get the following message upon execution:
libcuda routine cuMemsetD32Async not loaded, exiting
I have the following device:
CUDA Driver Version: 3000
NVRM version: NVIDIA UNIX x86 Kernel Module 195.36.24 Thu Apr 22 09:18:20 PDT 2010
Device Number: 0
Device Name: GeForce 310
Device Revision Number: 1.2
Global Memory Size: 536084480
Number of Multiprocessors: 2
Number of Cores: 16
Concurrent Copy and Execution: Yes
Total Constant Memory: 65536
Total Shared Memory per Block: 16384
Registers per Block: 16384
Warp Size: 32
Maximum Threads per Block: 512
Maximum Block Dimensions: 512, 512, 64
Maximum Grid Dimensions: 65535 x 65535 x 1
Maximum Memory Pitch: 2147483647B
Texture Alignment: 256B
Clock Rate: 1402 MHz
Execution Timeout: Yes
Integrated Device: No
Can Map Host Memory: Yes
Compute Mode: default
Concurrent Kernels: No
ECC Enabled: No
Initialization time: 833 microseconds
Current free memory: 431493120
Upload time (4MB): 1877 microseconds (1403 ms pinned)
Download time: 1628 microseconds (1326 ms pinned)
Upload bandwidth: 2234 MB/sec (2989 MB/sec pinned)
Download bandwidth: 2576 MB/sec (3163 MB/sec pinned)