As can be seen at the previous URL, the GPU consists of 480(240x2) processor cores and 1792MB(865x2) of memory.
However, if I execute the “deviceQuery.exe” program who comes at the NVIDIA GPU Computing SDK the result is as follows:
So, it notes that the GPU is “Geforce GTX 295” but only appears to be 240 processor cores and 869MB of memory! Indeed, if I try to load more than 869MB in the GPU, the program crashes.
So the question is: why CUDA doesn’t detect the correct specifications of the GTX295??. is it a bug??, Is there any solution?
As can be seen at the previous URL, the GPU consists of 480(240x2) processor cores and 1792MB(865x2) of memory.
However, if I execute the “deviceQuery.exe” program who comes at the NVIDIA GPU Computing SDK the result is as follows:
So, it notes that the GPU is “Geforce GTX 295” but only appears to be 240 processor cores and 869MB of memory! Indeed, if I try to load more than 869MB in the GPU, the program crashes.
So the question is: why CUDA doesn’t detect the correct specifications of the GTX295??. is it a bug??, Is there any solution?
What you should see is two such devices. Basically its seems your second device is not detected. While I have not much experience with Windows in these matters, I think I remember that if you activate SLI, CUDA can not detect both devices (probably when using SLI Windows thinks there is only one device, and so CUDA can only find one). CUDA can not use the SLI bridge.
Cheers
Ceearem
P.S. This is how the output looks like on one of our Linux machines with 2 GTX295 and a 8400GS for the X-Server (though right now none is running…)
/usr/app-soft/NVIDIA_GPU_Computing_SDK/C/bin/linux/release/deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
There are 5 devices supporting CUDA
Device 0: "GeForce GTX 295"
CUDA Driver Version: 3.20
CUDA Runtime Version: 3.20
CUDA Capability Major revision number: 1
CUDA Capability Minor revision number: 3
Total amount of global memory: 939327488 bytes
Number of multiprocessors: 30
Number of cores: 240
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 16384
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 2147483647 bytes
Texture alignment: 256 bytes
Clock rate: 1.38 GHz
Concurrent copy and execution: Yes
Run time limit on kernels: No
Integrated: No
Support host page-locked memory mapping: Yes
Compute mode: Exclusive (only one host thread at a time can use this device)
Concurrent kernel execution: No
Device has ECC support enabled: No
Device 1: "GeForce 8400 GS"
CUDA Driver Version: 3.20
CUDA Runtime Version: 3.20
CUDA Capability Major revision number: 1
CUDA Capability Minor revision number: 1
Total amount of global memory: 536150016 bytes
Number of multiprocessors: 1
Number of cores: 8
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 8192
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 2147483647 bytes
Texture alignment: 256 bytes
Clock rate: 1.62 GHz
Concurrent copy and execution: No
Run time limit on kernels: No
Integrated: No
Support host page-locked memory mapping: Yes
Compute mode: Default (multiple host threads can use this device simultaneously)
Concurrent kernel execution: No
Device has ECC support enabled: No
Device 2: "GeForce GTX 295"
CUDA Driver Version: 3.20
CUDA Runtime Version: 3.20
CUDA Capability Major revision number: 1
CUDA Capability Minor revision number: 3
Total amount of global memory: 939327488 bytes
Number of multiprocessors: 30
Number of cores: 240
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 16384
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 2147483647 bytes
Texture alignment: 256 bytes
Clock rate: 1.38 GHz
Concurrent copy and execution: Yes
Run time limit on kernels: No
Integrated: No
Support host page-locked memory mapping: Yes
Compute mode: Exclusive (only one host thread at a time can use this device)
Concurrent kernel execution: No
Device has ECC support enabled: No
Device 3: "GeForce GTX 295"
CUDA Driver Version: 3.20
CUDA Runtime Version: 3.20
CUDA Capability Major revision number: 1
CUDA Capability Minor revision number: 3
Total amount of global memory: 939327488 bytes
Number of multiprocessors: 30
Number of cores: 240
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 16384
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 2147483647 bytes
Texture alignment: 256 bytes
Clock rate: 1.38 GHz
Concurrent copy and execution: Yes
Run time limit on kernels: No
Integrated: No
Support host page-locked memory mapping: Yes
Compute mode: Exclusive (only one host thread at a time can use this device)
Concurrent kernel execution: No
Device has ECC support enabled: No
Device 4: "GeForce GTX 295"
CUDA Driver Version: 3.20
CUDA Runtime Version: 3.20
CUDA Capability Major revision number: 1
CUDA Capability Minor revision number: 3
Total amount of global memory: 939327488 bytes
Number of multiprocessors: 30
Number of cores: 240
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 16384
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 2147483647 bytes
Texture alignment: 256 bytes
Clock rate: 1.38 GHz
Concurrent copy and execution: Yes
Run time limit on kernels: No
Integrated: No
Support host page-locked memory mapping: Yes
Compute mode: Exclusive (only one host thread at a time can use this device)
Concurrent kernel execution: No
Device has ECC support enabled: No
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 3.20, CUDA Runtime Version = 3.20, NumDevs = 5, Device = GeForce GTX 295, Device = GeForce 8400 GS
PASSED
Press <Enter> to Quit...
-----------------------------------------------------------
What you should see is two such devices. Basically its seems your second device is not detected. While I have not much experience with Windows in these matters, I think I remember that if you activate SLI, CUDA can not detect both devices (probably when using SLI Windows thinks there is only one device, and so CUDA can only find one). CUDA can not use the SLI bridge.
Cheers
Ceearem
P.S. This is how the output looks like on one of our Linux machines with 2 GTX295 and a 8400GS for the X-Server (though right now none is running…)
/usr/app-soft/NVIDIA_GPU_Computing_SDK/C/bin/linux/release/deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
There are 5 devices supporting CUDA
Device 0: "GeForce GTX 295"
CUDA Driver Version: 3.20
CUDA Runtime Version: 3.20
CUDA Capability Major revision number: 1
CUDA Capability Minor revision number: 3
Total amount of global memory: 939327488 bytes
Number of multiprocessors: 30
Number of cores: 240
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 16384
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 2147483647 bytes
Texture alignment: 256 bytes
Clock rate: 1.38 GHz
Concurrent copy and execution: Yes
Run time limit on kernels: No
Integrated: No
Support host page-locked memory mapping: Yes
Compute mode: Exclusive (only one host thread at a time can use this device)
Concurrent kernel execution: No
Device has ECC support enabled: No
Device 1: "GeForce 8400 GS"
CUDA Driver Version: 3.20
CUDA Runtime Version: 3.20
CUDA Capability Major revision number: 1
CUDA Capability Minor revision number: 1
Total amount of global memory: 536150016 bytes
Number of multiprocessors: 1
Number of cores: 8
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 8192
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 2147483647 bytes
Texture alignment: 256 bytes
Clock rate: 1.62 GHz
Concurrent copy and execution: No
Run time limit on kernels: No
Integrated: No
Support host page-locked memory mapping: Yes
Compute mode: Default (multiple host threads can use this device simultaneously)
Concurrent kernel execution: No
Device has ECC support enabled: No
Device 2: "GeForce GTX 295"
CUDA Driver Version: 3.20
CUDA Runtime Version: 3.20
CUDA Capability Major revision number: 1
CUDA Capability Minor revision number: 3
Total amount of global memory: 939327488 bytes
Number of multiprocessors: 30
Number of cores: 240
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 16384
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 2147483647 bytes
Texture alignment: 256 bytes
Clock rate: 1.38 GHz
Concurrent copy and execution: Yes
Run time limit on kernels: No
Integrated: No
Support host page-locked memory mapping: Yes
Compute mode: Exclusive (only one host thread at a time can use this device)
Concurrent kernel execution: No
Device has ECC support enabled: No
Device 3: "GeForce GTX 295"
CUDA Driver Version: 3.20
CUDA Runtime Version: 3.20
CUDA Capability Major revision number: 1
CUDA Capability Minor revision number: 3
Total amount of global memory: 939327488 bytes
Number of multiprocessors: 30
Number of cores: 240
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 16384
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 2147483647 bytes
Texture alignment: 256 bytes
Clock rate: 1.38 GHz
Concurrent copy and execution: Yes
Run time limit on kernels: No
Integrated: No
Support host page-locked memory mapping: Yes
Compute mode: Exclusive (only one host thread at a time can use this device)
Concurrent kernel execution: No
Device has ECC support enabled: No
Device 4: "GeForce GTX 295"
CUDA Driver Version: 3.20
CUDA Runtime Version: 3.20
CUDA Capability Major revision number: 1
CUDA Capability Minor revision number: 3
Total amount of global memory: 939327488 bytes
Number of multiprocessors: 30
Number of cores: 240
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 16384
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 2147483647 bytes
Texture alignment: 256 bytes
Clock rate: 1.38 GHz
Concurrent copy and execution: Yes
Run time limit on kernels: No
Integrated: No
Support host page-locked memory mapping: Yes
Compute mode: Exclusive (only one host thread at a time can use this device)
Concurrent kernel execution: No
Device has ECC support enabled: No
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 3.20, CUDA Runtime Version = 3.20, NumDevs = 5, Device = GeForce GTX 295, Device = GeForce 8400 GS
PASSED
Press <Enter> to Quit...
-----------------------------------------------------------
I can’t remember exactly what the option is named, but there’s something you need to set in the nVidia dialog in Control Panel. Something about acceleration or PhysX, if I remember correctly (my old development machine had 2 GTX295 boards in it, and I ran into the same problem).
I can’t remember exactly what the option is named, but there’s something you need to set in the nVidia dialog in Control Panel. Something about acceleration or PhysX, if I remember correctly (my old development machine had 2 GTX295 boards in it, and I ran into the same problem).