only 114 cores on Tesla C2070 !

revaz · November 26, 2010, 11:20pm

Dear List,

We we have a Tesla C2070 running on a Fedora 13
installed with the recent Nvidia drivers 3.2 (260.19.14).

When I run “devicequery”, I get the following info:

There is 1 device supporting CUDA

Device 0: “Tesla C2070”
CUDA Driver Version: 3.20
CUDA Runtime Version: 3.20
CUDA Capability Major revision number: 2
CUDA Capability Minor revision number: 0
Total amount of global memory: 1341587456 bytes
Number of multiprocessors: 14
Number of cores: 112
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 32768
Warp size: 32
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Clock rate: 1.15 GHz
Concurrent copy and execution: Yes
Run time limit on kernels: No
Integrated: No
Support host page-locked memory mapping: Yes
Compute mode: Default (multiple host threads can use this device simultaneously)

To my great surprise, there is only 114 cores detected,
instead of the 448 I expected.
The Total amount of global memory is also much less than the 6G expected.

Its more or less like only 1/4 of the multiprocessors are detected.

How can I understand this ?

revaz · November 26, 2010, 11:20pm

Dear List,

We we have a Tesla C2070 running on a Fedora 13
installed with the recent Nvidia drivers 3.2 (260.19.14).

When I run “devicequery”, I get the following info:

There is 1 device supporting CUDA

Device 0: “Tesla C2070”
CUDA Driver Version: 3.20
CUDA Runtime Version: 3.20
CUDA Capability Major revision number: 2
CUDA Capability Minor revision number: 0
Total amount of global memory: 1341587456 bytes
Number of multiprocessors: 14
Number of cores: 112
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 32768
Warp size: 32
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Clock rate: 1.15 GHz
Concurrent copy and execution: Yes
Run time limit on kernels: No
Integrated: No
Support host page-locked memory mapping: Yes
Compute mode: Default (multiple host threads can use this device simultaneously)

To my great surprise, there is only 114 cores detected,
instead of the 448 I expected.
The Total amount of global memory is also much less than the 6G expected.

Its more or less like only 1/4 of the multiprocessors are detected.

How can I understand this ?

avidday · November 27, 2010, 11:15am

It looks like you are running a old version of deviceQuery. The CUDA APIs only return the number of multiprocessors, not the number of cores, so the original deviceQuery code had 8 cores per multiprocessor hard coded into it. For Fermi cards, that is incorrect, there are either 32 or 48 cores per MP. That is where the factor of four comes from. You can safely ignore the discrepancy, it is not real.

avidday · November 27, 2010, 11:15am

It looks like you are running a old version of deviceQuery. The CUDA APIs only return the number of multiprocessors, not the number of cores, so the original deviceQuery code had 8 cores per multiprocessor hard coded into it. For Fermi cards, that is incorrect, there are either 32 or 48 cores per MP. That is where the factor of four comes from. You can safely ignore the discrepancy, it is not real.

revaz · November 28, 2010, 8:19pm

Ok, thanks,

indeed, I was using an old version of deviceQuery that I wrapped for python.

The updated version now works perfectly.

yves

esjimen · May 23, 2011, 9:35pm

Did it also fix the memory discrepancy?

Topic		Replies	Views
Cores in Tesla c2050 card shows 112 cores instead of 448 CUDA Programming and Performance	6	11262	September 4, 2010
gtx 470 showing 112 cores CUDA Programming and Performance	8	7557	June 29, 2010
GTX 560 Ti number of cores 256 vs 384? CUDA Programming and Performance	1	10825	April 7, 2011
Tesla S1070 4GUPs CUDA Programming and Performance	5	1713	May 11, 2009
CUDA with Tesla C1060 CUDA Programming and Performance	2	11454	November 21, 2008
deviceQuery returns wrong information CUDA Programming and Performance	2	1449	August 8, 2016
CUDA 2.0Beta Device Query 8800GTS 320M returns 16 multiprocessor CUDA Programming and Performance	0	3723	June 5, 2008
Capability issue CUDA Programming and Performance	2	5655	June 19, 2008
[beginner ]cudaGetDeviceCount CUDA Programming and Performance	8	11219	May 4, 2010
CUDA 4.1 + GeForce GTX 680 + Fedora 14 CUDA Programming and Performance	5	2969	May 11, 2012

only 114 cores on Tesla C2070 !

Related topics