I tested the same thing with GTX480, GTX580 normal, and Tesla C2070. But they did not fail. The 3GB memory in my GTX580 is accessible when many variables are used, e.g., it’s successful to allocate 256MB * 11 variables(= 2816MB). So It seems that 3GB memory itself is at least not broken.
Does anyone know how this happens?
OS : Windows7 64bit
Driver Version : 263.14(from accessory CD), 266.58(from NVIDIA website)
Sounds like the 1GB alloc worked with GTX480, GTX580 normal, and Tesla C2070.
Only on the (non-reference) 3GB Palit card did the problem appear. Can the
original poster confirm or deny the successful allocs were also on Win7/64?
C:\ProgramData\NVIDIA Corporation\NVIDIA GPU Computing SDK 3.2\C\bin\win64\Relea
se\deviceQuery.exe Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
There are 2 devices supporting CUDA
Device 0: "GeForce GTX 580"
CUDA Driver Version: 3.20
CUDA Runtime Version: 3.20
CUDA Capability Major/Minor version number: 2.0
Total amount of global memory: 3181838336 bytes
Multiprocessors x Cores/MP = Cores: 16 (MP) x 32 (Cores/MP) = 512 (
Cores)
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 32768
Warp size: 32
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Clock rate: 1.57 GHz
Concurrent copy and execution: Yes
Run time limit on kernels: No
Integrated: No
Support host page-locked memory mapping: Yes
Compute mode: Default (multiple host threads
can use this device simultaneously)
Concurrent kernel execution: Yes
Device has ECC support enabled: No
Device is using TCC driver mode: No
Device 1: "Quadro 2000"
CUDA Driver Version: 3.20
CUDA Runtime Version: 3.20
CUDA Capability Major/Minor version number: 2.1
Total amount of global memory: 1041825792 bytes
Multiprocessors x Cores/MP = Cores: 4 (MP) x 48 (Cores/MP) = 192 (C
ores)
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 32768
Warp size: 32
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Clock rate: 1.25 GHz
Concurrent copy and execution: Yes
Run time limit on kernels: No
Integrated: No
Support host page-locked memory mapping: Yes
Compute mode: Default (multiple host threads
can use this device simultaneously)
Concurrent kernel execution: Yes
Device has ECC support enabled: No
Device is using TCC driver mode: No
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 3.20, CUDA Runtime Vers
ion = 3.20, NumDevs = 2, Device = GeForce GTX 580, Device = Quadro 2000
PASSED
Press <Enter> to Quit...
-----------------------------------------------------------
Hello Tooney,
Since the 1.5GB memory of standard GTX 580 is so limiting (even
for games), these 3GB cards are more intersting. Apart from that
large alloc issue, has your 3GB Palit card performed as expected?
I just got my hands on a Gainward Phantom GTX 580 3GB. I have not seen the problem stated above. I can allocate a single array of 3GB size. I’m running Linux 64bit with 260.19.21 drivers.
Cheers
./deviceQuery Starting…
CUDA Device Query (Runtime API) version (CUDART static linking)
There are 2 devices supporting CUDA
Device 0: “GeForce GTX 580”
CUDA Driver Version: 3.20
CUDA Runtime Version: 3.20
CUDA Capability Major/Minor version number: 2.0
Total amount of global memory: 3220897792 bytes
Multiprocessors x Cores/MP = Cores: 16 (MP) x 32 (Cores/MP) = 512 (Cores)
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 32768
Warp size: 32
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Clock rate: 1.57 GHz
Concurrent copy and execution: Yes
Run time limit on kernels: No
Integrated: No
Support host page-locked memory mapping: Yes
Compute mode: Default (multiple host threads can use this device simultaneously)
Concurrent kernel execution: Yes
Device has ECC support enabled: No
Device is using TCC driver mode: No
Device 1: “GeForce 9600 GT”
CUDA Driver Version: 3.20
CUDA Runtime Version: 3.20
CUDA Capability Major/Minor version number: 1.1
Total amount of global memory: 536543232 bytes
Multiprocessors x Cores/MP = Cores: 8 (MP) x 8 (Cores/MP) = 64 (Cores)
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 8192
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 2147483647 bytes
Texture alignment: 256 bytes
Clock rate: 1.75 GHz
Concurrent copy and execution: Yes
Run time limit on kernels: Yes
Integrated: No
Support host page-locked memory mapping: Yes
Compute mode: Default (multiple host threads can use this device simultaneously)
Concurrent kernel execution: No
Device has ECC support enabled: No
Device is using TCC driver mode: No
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 3.20, CUDA Runtime Version = 3.20, NumDevs = 2, Device = GeForce GTX 580, Device = GeForce 9600 GT