GTX580 3GB memory from Palit has any problems?

Tooney · February 7, 2011, 10:30am

Hello everyone,

I got GTX580 3GB memory from Palit(http://www.palit.biz/main/vgapro.php?id=1503). Then, I found that I couldn’t allocate 1GB memory for a single variable.

I tested the same thing with GTX480, GTX580 normal, and Tesla C2070. But they did not fail. The 3GB memory in my GTX580 is accessible when many variables are used, e.g., it’s successful to allocate 256MB * 11 variables(= 2816MB). So It seems that 3GB memory itself is at least not broken.

Does anyone know how this happens?

OS : Windows7 64bit
Driver Version : 263.14(from accessoryã€€CD), 266.58(from NVIDIA website)

Thank you,

cbuchner1 · February 7, 2011, 10:45am

The WDDM driver model introduced with Windows Vista is responsible for this.

Switch to Linux or Windows XP 64 bit to lift these restrictions.

Alternatively use a Tesla card in TCC mode.

Christian

Tooney · February 8, 2011, 1:20am

Hello cbuchner1,

Is it really because of the WDDM? If it’s true, the other GPUs must not be able to allocate 1GB memory for a variable either.

SPWorley · February 8, 2011, 1:31am

Yes.

But the GPUs have no problem doing such allocations, though. As Christian says, it’s Windows 7’s WDDM driver model that limits it, not the hardware.

nnunn · February 8, 2011, 12:19pm

Sounds like the 1GB alloc worked with GTX480, GTX580 normal, and Tesla C2070.
Only on the (non-reference) 3GB Palit card did the problem appear. Can the
original poster confirm or deny the successful allocs were also on Win7/64?

Tooney · February 10, 2011, 10:40am

Yes, only Palit GTX580 with 3GB can’t allocate 1GB memory at one time (Win7/64bit).

I still can’t understand how WDDM works with GPU memory…

DrAnderson42 · February 10, 2011, 12:48pm

Check the release notes:

http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/docs/CUDA_Toolkit_Release_Notes_Windows.txt

Sarnath · February 10, 2011, 1:33pm

But none of the above explanations explain why the problem strikes only on GTX580 and not on others…
Tooney,
Can you show us the deviceQuery output?

nnunn · February 10, 2011, 10:51pm

Question is: why does it work fine on a normal GTX 580 but fail on the non-standard (3 GB) Palit card.

Anyone know how Palit/Gainward have implemented the extra memory?

Tooney · February 14, 2011, 7:08am

DrAnderson42,
I know the limitation, and I think it’s not the reason.

Sarnath,
OK, I’ll upload the output, in few days.

nnunn,
Yes, that’s what I asked.

Thank you,

Tooney · February 16, 2011, 3:43am

Hello Sarnath,

Here is the Device Query output.

C:\ProgramData\NVIDIA Corporation\NVIDIA GPU Computing SDK 3.2\C\bin\win64\Relea

se\deviceQuery.exe Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

There are 2 devices supporting CUDA

Device 0: "GeForce GTX 580"

  CUDA Driver Version:                           3.20

  CUDA Runtime Version:                          3.20

  CUDA Capability Major/Minor version number:    2.0

  Total amount of global memory:                 3181838336 bytes

  Multiprocessors x Cores/MP = Cores:            16 (MP) x 32 (Cores/MP) = 512 (

Cores)

  Total amount of constant memory:               65536 bytes

  Total amount of shared memory per block:       49152 bytes

  Total number of registers available per block: 32768

  Warp size:                                     32

  Maximum number of threads per block:           1024

  Maximum sizes of each dimension of a block:    1024 x 1024 x 64

  Maximum sizes of each dimension of a grid:     65535 x 65535 x 1

  Maximum memory pitch:                          2147483647 bytes

  Texture alignment:                             512 bytes

  Clock rate:                                    1.57 GHz

  Concurrent copy and execution:                 Yes

  Run time limit on kernels:                     No

  Integrated:                                    No

  Support host page-locked memory mapping:       Yes

  Compute mode:                                  Default (multiple host threads

can use this device simultaneously)

  Concurrent kernel execution:                   Yes

  Device has ECC support enabled:                No

  Device is using TCC driver mode:               No

Device 1: "Quadro 2000"

  CUDA Driver Version:                           3.20

  CUDA Runtime Version:                          3.20

  CUDA Capability Major/Minor version number:    2.1

  Total amount of global memory:                 1041825792 bytes

  Multiprocessors x Cores/MP = Cores:            4 (MP) x 48 (Cores/MP) = 192 (C

ores)

  Total amount of constant memory:               65536 bytes

  Total amount of shared memory per block:       49152 bytes

  Total number of registers available per block: 32768

  Warp size:                                     32

  Maximum number of threads per block:           1024

  Maximum sizes of each dimension of a block:    1024 x 1024 x 64

  Maximum sizes of each dimension of a grid:     65535 x 65535 x 1

  Maximum memory pitch:                          2147483647 bytes

  Texture alignment:                             512 bytes

  Clock rate:                                    1.25 GHz

  Concurrent copy and execution:                 Yes

  Run time limit on kernels:                     No

  Integrated:                                    No

  Support host page-locked memory mapping:       Yes

  Compute mode:                                  Default (multiple host threads

can use this device simultaneously)

  Concurrent kernel execution:                   Yes

  Device has ECC support enabled:                No

  Device is using TCC driver mode:               No

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 3.20, CUDA Runtime Vers

ion = 3.20, NumDevs = 2, Device = GeForce GTX 580, Device = Quadro 2000

PASSED

Press <Enter> to Quit...

-----------------------------------------------------------

nnunn · February 18, 2011, 3:46pm

Hello Tooney,
Since the 1.5GB memory of standard GTX 580 is so limiting (even
for games), these 3GB cards are more intersting. Apart from that
large alloc issue, has your 3GB Palit card performed as expected?

moozoo · February 19, 2011, 5:05am

I don’t suppose you could also give us the opencl device information,
In particular CL_DEVICE_MAX_MEM_ALLOC_SIZE

GPU Caps Viewer will show this. Click tools , Full XML Export and then copy and past the OpenCL section text

gpgpu_apprentice · March 3, 2011, 9:07am

Hello.

I just got my hands on a Gainward Phantom GTX 580 3GB. I have not seen the problem stated above. I can allocate a single array of 3GB size. I’m running Linux 64bit with 260.19.21 drivers.

Cheers

./deviceQuery Starting…

CUDA Device Query (Runtime API) version (CUDART static linking)

There are 2 devices supporting CUDA

Device 0: “GeForce GTX 580”
CUDA Driver Version: 3.20
CUDA Runtime Version: 3.20
CUDA Capability Major/Minor version number: 2.0
Total amount of global memory: 3220897792 bytes
Multiprocessors x Cores/MP = Cores: 16 (MP) x 32 (Cores/MP) = 512 (Cores)
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 32768
Warp size: 32
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Clock rate: 1.57 GHz
Concurrent copy and execution: Yes
Run time limit on kernels: No
Integrated: No
Support host page-locked memory mapping: Yes
Compute mode: Default (multiple host threads can use this device simultaneously)
Concurrent kernel execution: Yes
Device has ECC support enabled: No
Device is using TCC driver mode: No

Device 1: “GeForce 9600 GT”
CUDA Driver Version: 3.20
CUDA Runtime Version: 3.20
CUDA Capability Major/Minor version number: 1.1
Total amount of global memory: 536543232 bytes
Multiprocessors x Cores/MP = Cores: 8 (MP) x 8 (Cores/MP) = 64 (Cores)
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 8192
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 2147483647 bytes
Texture alignment: 256 bytes
Clock rate: 1.75 GHz
Concurrent copy and execution: Yes
Run time limit on kernels: Yes
Integrated: No
Support host page-locked memory mapping: Yes
Compute mode: Default (multiple host threads can use this device simultaneously)
Concurrent kernel execution: No
Device has ECC support enabled: No
Device is using TCC driver mode: No

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 3.20, CUDA Runtime Version = 3.20, NumDevs = 2, Device = GeForce GTX 580, Device = GeForce 9600 GT

PASSED

Press to Quit…

Topic		Replies	Views
GTX295 Specefications & CUDA CUDA Programming and Performance	5	12280	October 7, 2010
gpu computing sdk 4.0 runtime failures build the sdk succesfully, but the run of any exe failed CUDA Programming and Performance	3	2793	August 8, 2011
GTX750Ti and buffers > 1GB on Win7 CUDA Programming and Performance	91	19877	July 21, 2016
one CUDA card unrecognized in 64bit Win7 CUDA Programming and Performance	5	1697	April 15, 2011
[980 Ti, Windows 10, CUDA 7.5] Out of memory after allocating 4.5 out of 6gb CUDA Programming and Performance	7	5124	December 6, 2015
How much GPU memory can cudaMalloc get? CUDA Programming and Performance	17	15150	April 2, 2022
I don't understand the execution time (k40c & GTX580). CUDA Programming and Performance	9	2459	April 23, 2015
why "all CUDA-capable devices are busy or unavailable" ? CUDA Programming and Performance	34	64268	April 20, 2011
CUDA and Quadro FX 580 memory usage/corruption CUDA Programming and Performance	0	716	July 27, 2014
CUDA 3.2 on GTX 480 is "busy or unavailable" CUDA Programming and Performance	19	73461	March 21, 2011

GTX580 3GB memory from Palit has any problems?

Related topics