G210, GT220 deviceQuery?

Now that the G210 as well as GT220 are about to hit retail, would somebody care to post the output from ‘deviceQuery’ for those cards?

Are these going to be the first compute 1.2 capable cores in the wild, or are they just G9x cores shrunk onto TSMCs 40nm rule?

From gpugrid.net forums

02/09/2009 19:28:49 CUDA device: GeForce GT 220 (driver version 19062, compute capability 1.2, 1024MB, est. 23GFLOPS)

Voila, it’s Compute 1.2

I am pretty dissatisfied with nVidia’s information policy regarding the newer laptop and OEM/mid range chips. It’s really hard to get any reliable information before product launch day.

Christian

OK, so the GT220 is compute 1.2 and therefore has more registers. Thats great! How about “Support host page-locked memory mapping” and “Concurrent copy and execution” then?

The G210 has 16 shaders (rather than an expected 24) which would be consistent with a die shrink of a 9400GT and compute 1.1 but … what is it actually?

Compute 1.2 should all support zero-copy and async memcpys. (I haven’t actually checked this, I’m going off of memory, but I’m pretty sure I’m right)

Well, could you? I am not asking for a big release with trumpets and elephants, not even a candlelight moment - just a deviceQuery so the card won’t be buried in total silence.

I’d like to ask for the fanfares and trumpets and for the detailed CUDA compute specs to be included on the product page of each chip.

Just stating DirectX capability, number of shaders, clock rates is not enough for us coders. I’d be eternally grateful if you could forward this request to marketing and/or the web design guys.

Add a small “CUDA” section to the Technical Specifications page.

Christian

And more importantly - will current Linux and Windows CUDA drivers recognize the PCI device IDs of these cards?

Or will one have to install inofficial or Beta drivers?

Searching the nVidia driver download page for a driver supporting GT220 on 32 bit Linux comes up empty, even when including Beta drivers. Sigh

Christian

GT220 is supported in 190.36 and has VDPAU feature set ‘C’:

GeForce GT 220 0x0A20 C

GeForce GT 230M 0x0A2A C

GeForce GT 240M 0x0A34 C

GeForce G210 0x0A60 C

GeForce G210M 0x0A74 C

GeForce GTS 260M 0x0CA8 C

GeForce GTS 250M 0x0CA9 C

ftp://download.nvidia.com/XFree86/Linux-x…appendix-a.html

Just bought a cheap GT220 model with 1GB of DDR2 memory for 58 Euros. Will post the deviceQuery string tomorrow or so.

Christian

Taking a bullet for the team? External Media

Purely for egoistic reasons. I am running out of registers on the G80/G92 architecture and wanted to get Compute 1.2 on the cheap.

The GT220 hasn’t exactly been setting the world on fire in the gaming benchmark stakes, but it should make a great CUDA development card. 48 cuda capability 1.2 MPs should perform pretty well on compute bound tasks with all that extra register file space and niceties like shared memory atomics.

Dear nVidia, you’ve got to be kidding me. The card has been in retail channels all week. We need working Linux drivers, please.

(II) NVIDIA dlloader X Driver  190.36  Wed Sep 23 07:47:56 PDT 2009

(II) NVIDIA Unified Driver for all Supported NVIDIA GPUs

(II) Primary Device is: 

(EE) No devices detected.

UPDATE:

Dear nVidia, I apologize.

xorg expected me to specify a BusID in the device section, probably because I had two cards in the machine and it could not determine on its own which one to use. The driver works now.

Device 0: "GeForce GT 220"

  CUDA Driver Version:						   2.30

  CUDA Runtime Version:						  2.30

  CUDA Capability Major revision number:		 1

  CUDA Capability Minor revision number:		 2

  Total amount of global memory:				 1073414144 bytes

  Number of multiprocessors:					 6

  Number of cores:							   48

  Total amount of constant memory:			   65536 bytes

  Total amount of shared memory per block:	   16384 bytes

  Total number of registers available per block: 16384

  Warp size:									 32

  Maximum number of threads per block:		   512

  Maximum sizes of each dimension of a block:	512 x 512 x 64

  Maximum sizes of each dimension of a grid:	 65535 x 65535 x 1

  Maximum memory pitch:						  262144 bytes

  Texture alignment:							 256 bytes

  Clock rate:									1.36 GHz

  Concurrent copy and execution:				 Yes

  Run time limit on kernels:					 Yes

  Integrated:									No

  Support host page-locked memory mapping:	   Yes

  Compute mode:								  Default (multiple host threads can use this device simultaneously)

In comparison to my nVidia 8500 GT in a PCI bus (not PCI-express), which is now the secondary CUDA device. Only different fields are shown. I should probably swap the display card such that I get unlimited run time for kernels on the faster device.

CUDA Capability Minor revision number:		 1

  Total amount of global memory:				 268173312 bytes

  Number of multiprocessors:					 2

  Number of cores:							   16

  Total number of registers available per block: 8192

  Clock rate:									0.92 GHz

  Run time limit on kernels:					 No

  Support host page-locked memory mapping:	   No

And don’t get a card equipped with DDR2 memory (16 GB/sec). At minimum you want GDDR2 (25 GB/sec) or even DDR3 (32 GB/sec).

Device to Device Bandwidth

.

Transfer Size (Bytes)   Bandwidth(MB/s)

 33554432			   13844.9

Argh ;) That’s what I get for buying cheap.

OK! Ordering a Gigabyte GT220 with DDR3

Thanks!

EDIT: It looks like DDR3 is 25 GB/sec as well. It is instead GDDR3 which maxes out at 32 (and also sips a lot more power)

Let me know if anyone finds a model that is equipped with GDDR3 and offers the 32GB/sec throughput.

For memory bandwidth limited apps, this indeed makes a difference.

UPDATE: Seems that I am effectively able to squeeze about 50 GFlops out of my card with my application.

Christian

Wait second now … IIRC bandwidthTest measures a[n] = b[n], where both a and b are in global memory. So each transfer involves one load as well as one store, meaning the raw bandwidth is twice of what is reported. To convince yourself that this is so you could try a += b[n], which should run twice as fast and return a measure more in line with what marketing would like to put in their press-releases.

EDIT: Or maybe not? In any case, my current theoretical bandwidth is 8 GB/sec, but bandwidthTest reports only 4.8 …

The new card should arrive some time next week, so - if you can manage - hold your horses until then and I’ll post a measure you can compare to.

UPDATE: bandwidth for the Gigabyte GT220-OC, device to device is indeed about 24GB/sec. That will do for my purposes. I am not sure why the card has a fan? But it runs very cool as well as very silent (as in “unnoticeable”), so no complaints.

bandwidthTest already multiplies by a factor of 2 in computing the rate to account for the read and write. Most of my GTX 200-series cards get about 80% of their theoretical device-to-device bandwidth in bandwidthTest. You’re only getting 60%, so I wonder if this is related to ratio of MPs to memory bandwidth or some other factor.

GDDR 3 Models appear to be (no warranty given, check with the manufacturer)

Club 3D: CGNX-G222I (512MB)
MSI: N220GT‐MD1G/D3 (1GB)
Gigabyte: Die GV-N220OC-1GI (1GB)
Point of View: R-VGA150929-D3 (1GB)
ASUS: ENGT220/DI/1GD3 (1GB GDDR3, pretty sure)

Source: [url=“Geforce G 210 & GT 220: Club 3D, Elitegroup, Gigabyte, Leadtek, PoV und Zotac - Update: Passive Geforce GT 220 von MSI”]Geforce G 210 & GT 220: Club 3D, Elitegroup, Gigabyte, Leadtek, PoV und Zotac - Update: Passive Geforce GT 220 von MSI

However, some retailers do not seem to differentiate clearly between DDR3 and GDDR3. So it seems to be a hit-or-miss game to get the 32 GBytes/sec memory bandwidth.

Passively cooled models:

Elitegroup: NSGT220C-1GQS-H und NSGT220C-512QZ-H