Cores in Tesla c2050 card shows 112 cores instead of 448

Dear Users,

    The Tesla C2050 GPU has 448 cores as mentioned by NVIDIA. But here in the following output, its showing 112 cores:

#lib/gpu/nvc_get_devices

Device 0: “Tesla C2050”
Revision number: 2.0
Total amount of global memory: 2.62 GB
Number of multiprocessors: 14
Number of cores: 112
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 32768
Warp size: 32
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Clock rate: 1.15 GHz
Concurrent copy and execution: Yes

Is there any difference between CUDA core and a normal core (as highlighted above)?

Also, there are no sample example programs inside CUDA Toolkit. Does anybody have such programs, which can compare both GPU & CPU performance?

Thanks

That is just a problem with the deviceQuery example. The previous generations of cards had 8 cores per multiprocessor, whereas the GF100 cards have 32 cores per multiprocessor. Because CUDA API doesn’t (yet) report the core count, only the multiprocessor count, that code has a fixed constant of 8 cores per multiprocessor. So it reports 14 * 8 = 112 instead of 14 * 32 = 448 as it should. There is nothing wrong with your card and you have not misunderstood the specifications.

That is just a problem with the deviceQuery example. The previous generations of cards had 8 cores per multiprocessor, whereas the GF100 cards have 32 cores per multiprocessor. Because CUDA API doesn’t (yet) report the core count, only the multiprocessor count, that code has a fixed constant of 8 cores per multiprocessor. So it reports 14 * 8 = 112 instead of 14 * 32 = 448 as it should. There is nothing wrong with your card and you have not misunderstood the specifications.

Thanks.

We’re facing one more problem with a CUDA ported application LAMMPS. It has been installed successfully, as per the instructions given in the pdf :

http://lammps.sandia.gov/workshops/Feb10/M…own/gpu_tut.pdf

But it failed with following error:

cat gpu_test_lj_out

lmp_openmpi: pair_gpu_cell.cu:486: void build_cell_list(double*, int*, cell_list&, int, int, int, int, int, int, int, int): Assertion `err == cudaSuccess’ failed.

LAMMPS (7 Apr 2010)

Lattice spacing in x,y,z = 1.6796 1.6796 1.6796

Created orthogonal box = (0 0 0) to (33.5919 33.5919 33.5919)

1 by 1 by 1 processor grid

Created 32000 atoms


  • Using GPGPU acceleration for LJ-Cut:

GPU 0: Tesla C2050, 112 cores, 2.6 GB, 1.1 GHZ


Setting up run …

[nvidia:13239] *** Process received signal ***

[nvidia:13239] Signal: Aborted (6)

[nvidia:13239] Signal code: (-6)

[nvidia:13239] [ 0] /lib64/libpthread.so.0 [0x359be0e4c0]

[nvidia:13239] [ 1] /lib64/libc.so.6(gsignal+0x35) [0x359b230215]

[nvidia:13239] [ 2] /lib64/libc.so.6(abort+0x110) [0x359b231cc0]

[nvidia:13239] [ 3] /lib64/libc.so.6(__assert_fail+0xf6) [0x359b229696]

[nvidia:13239] [ 4] …/…/src/lmp_openmpi [0x7275cc]

[nvidia:13239] [ 5] …/…/src/lmp_openmpi(_Z12_lj_gpu_cellIffEdR13LJ_GPU_MemoryIT_T0_EPPdS

5_S6_PiiiibbPKdS9_+0x127) [0x7205b7]

[nvidia:13239] [ 6] …/…/src/lmp_openmpi(Z11lj_gpu_cellPPdS_S0_PiiiibbPKdS3+0x4f) [0x71bd6f]

[nvidia:13239] [ 7] …/…/src/lmp_openmpi(_ZN9LAMMPS_NS12PairLJCutGPU7computeEii+0xb1) [0x6a0861]

[nvidia:13239] [ 8] …/…/src/lmp_openmpi(_ZN9LAMMPS_NS6Verlet5setupEv+0x12a) [0x71771a]

[nvidia:13239] [ 9] …/…/src/lmp_openmpi(_ZN9LAMMPS_NS3Run7commandEiPPc+0x449) [0x6f5919]

[nvidia:13239] [10] …/…/src/lmp_openmpi(_ZN9LAMMPS_NS5Input15execute_commandEv+0xe0e) [0x5fd45e]

[nvidia:13239] [11] …/…/src/lmp_openmpi(_ZN9LAMMPS_NS5Input4fileEv+0x34d) [0x5fe60d]

[nvidia:13239] [12] …/…/src/lmp_openmpi(main+0x4a) [0x604c7a]

[nvidia:13239] [13] /lib64/libc.so.6(__libc_start_main+0xf4) [0x359b21d974]

[nvidia:13239] [14] …/…/src/lmp_openmpi(__gxx_personality_v0+0x421) [0x4600a9]

[nvidia:13239] *** End of error message ***

The input file is:

cat in.lj

3d Lennard-Jones melt

newton off

variable x index 1

variable y index 1

variable z index 1

variable xx equal 20*$x

variable yy equal 20*$y

variable zz equal 20*$z

units lj

atom_style atomic

lattice fcc 0.8442

region box block 0 {xx} 0 {yy} 0 ${zz}

create_box 1 box

create_atoms 1 box

mass 1 1.0

velocity all create 1.44 87287 loop geom

pair_style lj/cut/gpu one/node 0 2.5

pair_coeff 1 1 1.0 1.0 2.5

neighbor 0.3 bin

neigh_modify delay 0 every 20 check no

fix 1 all nve

run 100

What’s the wrong here?

The OS is RHEL-5.3 64 bit. The cuda toolkit says RHEL-5.4 64bit (cudatoolkit_3.1_linux_64_rhel5.4.run). Is this compatibility issue causing the failure?

Is there anybody successful in running this application on GPU?

Thanks once again

Thanks.

We’re facing one more problem with a CUDA ported application LAMMPS. It has been installed successfully, as per the instructions given in the pdf :

http://lammps.sandia.gov/workshops/Feb10/M…own/gpu_tut.pdf

But it failed with following error:

cat gpu_test_lj_out

lmp_openmpi: pair_gpu_cell.cu:486: void build_cell_list(double*, int*, cell_list&, int, int, int, int, int, int, int, int): Assertion `err == cudaSuccess’ failed.

LAMMPS (7 Apr 2010)

Lattice spacing in x,y,z = 1.6796 1.6796 1.6796

Created orthogonal box = (0 0 0) to (33.5919 33.5919 33.5919)

1 by 1 by 1 processor grid

Created 32000 atoms


  • Using GPGPU acceleration for LJ-Cut:

GPU 0: Tesla C2050, 112 cores, 2.6 GB, 1.1 GHZ


Setting up run …

[nvidia:13239] *** Process received signal ***

[nvidia:13239] Signal: Aborted (6)

[nvidia:13239] Signal code: (-6)

[nvidia:13239] [ 0] /lib64/libpthread.so.0 [0x359be0e4c0]

[nvidia:13239] [ 1] /lib64/libc.so.6(gsignal+0x35) [0x359b230215]

[nvidia:13239] [ 2] /lib64/libc.so.6(abort+0x110) [0x359b231cc0]

[nvidia:13239] [ 3] /lib64/libc.so.6(__assert_fail+0xf6) [0x359b229696]

[nvidia:13239] [ 4] …/…/src/lmp_openmpi [0x7275cc]

[nvidia:13239] [ 5] …/…/src/lmp_openmpi(_Z12_lj_gpu_cellIffEdR13LJ_GPU_MemoryIT_T0_EPPdS

5_S6_PiiiibbPKdS9_+0x127) [0x7205b7]

[nvidia:13239] [ 6] …/…/src/lmp_openmpi(Z11lj_gpu_cellPPdS_S0_PiiiibbPKdS3+0x4f) [0x71bd6f]

[nvidia:13239] [ 7] …/…/src/lmp_openmpi(_ZN9LAMMPS_NS12PairLJCutGPU7computeEii+0xb1) [0x6a0861]

[nvidia:13239] [ 8] …/…/src/lmp_openmpi(_ZN9LAMMPS_NS6Verlet5setupEv+0x12a) [0x71771a]

[nvidia:13239] [ 9] …/…/src/lmp_openmpi(_ZN9LAMMPS_NS3Run7commandEiPPc+0x449) [0x6f5919]

[nvidia:13239] [10] …/…/src/lmp_openmpi(_ZN9LAMMPS_NS5Input15execute_commandEv+0xe0e) [0x5fd45e]

[nvidia:13239] [11] …/…/src/lmp_openmpi(_ZN9LAMMPS_NS5Input4fileEv+0x34d) [0x5fe60d]

[nvidia:13239] [12] …/…/src/lmp_openmpi(main+0x4a) [0x604c7a]

[nvidia:13239] [13] /lib64/libc.so.6(__libc_start_main+0xf4) [0x359b21d974]

[nvidia:13239] [14] …/…/src/lmp_openmpi(__gxx_personality_v0+0x421) [0x4600a9]

[nvidia:13239] *** End of error message ***

The input file is:

cat in.lj

3d Lennard-Jones melt

newton off

variable x index 1

variable y index 1

variable z index 1

variable xx equal 20*$x

variable yy equal 20*$y

variable zz equal 20*$z

units lj

atom_style atomic

lattice fcc 0.8442

region box block 0 {xx} 0 {yy} 0 ${zz}

create_box 1 box

create_atoms 1 box

mass 1 1.0

velocity all create 1.44 87287 loop geom

pair_style lj/cut/gpu one/node 0 2.5

pair_coeff 1 1 1.0 1.0 2.5

neighbor 0.3 bin

neigh_modify delay 0 every 20 check no

fix 1 all nve

run 100

What’s the wrong here?

The OS is RHEL-5.3 64 bit. The cuda toolkit says RHEL-5.4 64bit (cudatoolkit_3.1_linux_64_rhel5.4.run). Is this compatibility issue causing the failure?

Is there anybody successful in running this application on GPU?

Thanks once again

I had the same issue with a couple of GTX 465 cards (reporting 88 instead of 352 cores). When I upgraded from the 2.3 Toolkit and SDK to the 3.1 Toolkit and SDK and reran deviceQuery, it got it right.

I guess the latest toolkit knows how many cores per multiprocessor for this device.

I had the same issue with a couple of GTX 465 cards (reporting 88 instead of 352 cores). When I upgraded from the 2.3 Toolkit and SDK to the 3.1 Toolkit and SDK and reran deviceQuery, it got it right.

I guess the latest toolkit knows how many cores per multiprocessor for this device.