Sounds like GK208 laptops/cards will support most sm_35 features

NVIDIA blog post here.

That’s great news for developers as 255 registers, Dynamic Parallelism and HyperQ are major features.

I assume 64-bit floating point will remain at 1/24th the throughput of single precision.

Perhaps this might fit into the “sm_32” compute capability category that was lurking in the CUDA 5.5 include files directory?

Unfortunately some laptops with GT 730M chips have been on the market for months, and it seems impossible to tell whether a model has the new GK208 chip or some earlier model.

nVidia deliberately does not list detailed tech specs on its 730M pages, apparently that’s exactly because of this product relabeling.

Not only is it confusing very, it’s not customer friendly: you won’t know what you get.

Can someone from NVIDIA clarify what the compute capability of GK208 is? The Kayla dev kit is suggesting the purchase of a GeForce GT 640, which is apparently using a GK208 chip now. The CUDA on ARM presentation makes it sound like dynamic parallelism will be supported on Kayla, so does this mean that GK208 is really sm_35/sm_32 in both desktop and mobile parts?

Incidentally, if this is true, then it means that the model designator “GeForce GT 640” will have been sold with GPUs that have three different compute capabilities: 2.1, 3.0 and 3.2/3.5. That is absolutely crazy, to say the least.

There are more three-digit numbers that start with a 6, so please use them. :)

GK208 is sm35.

Table 2 in the CUDA 5.5 Programming Guide lists the arithmetic throughputs of different basic operations for different capabilities. sm_35 is listed as having 1/3 rate FP64 throughput, but GK208 has only 1/24 throughput.

allanmac spotted a set of sm_32 intrinsic header files in the CUDA 5.5 toolkit. I hypothesized that GK208 would be sm_32, with sm_32 having all the sm_35 features except FP64 rate, similar to how sm_12 is the same as sm_13 except FP64 support.

I assume the programming guide will be updated for CUDA 5.5?
And that leaves the question: what is sm_32?

So it also looks like you would be able to buy certain versions of GT 630 & GT 640 ( Rev. 2) aswell as GT GT 635 that all have the GK208 chip.

At least according to this: List of Nvidia graphics processing units - Wikipedia

Hence you might want to have a go for those discrete cards instead of buying a new laptop…

@Jimmy, also notice that Wikipedia reports that some of the GK208 discrete cards have PCIe 2.0 x8 while the single SMX GT 635 has PCIe 3.0 x16. Assuming Wikipedia is correct, I would be semi-disappointed in a PCIe 2.0 x8 board even though it probably doesn’t matter at all.

Another tell-tale for GK208 might be the default graphics MHz: 902 for DDR3 and 1046 for GDDR5 GT 640 Rev 2.'s.

Of course all this is just speculation. Someone should buy one and report back to us if our hypotheses are correct. :)

I think I might just do that. :)

I’ve been wanting to try out dynamic parallelism for a while, but we don’t have the budget in our lab for new GPUs at the moment. For $90, I can buy it myself. The device linked from the Kayla page is this one:

which seems to be the only GeForce GT 640 with GDDR5 memory on Newegg. The clock rate matches the GK208 listing on Wikipedia, so I think it is the right one.

I was also perplexed by the specced PCIe 2.0. I’m wondering if they are using 2.0 to achieve insanely good performance/watt. Looking at [1] it has 697 GFLOPS @ 25 watt => 27.88 GFLOPS/watt which is probably some of the best I’ve seen for an AMD/Nvidia GPU. Seems to good to be true though…

[1] NVIDIA GeForce GT 630 Rev. 2 PCIe x8 Specs | TechPowerUp GPU Database

Here’s an existence proof. From the PCI ID’s database:

10de  NVIDIA Corporation
	1280  GK208 [GeForce GT 635]
	1282  GK208 [GeForce GT 640 Rev. 2]
	1284  GK208 [GeForce GT 630 Rev. 2]
	1290  GK208M [GeForce GT 730M]
		103c 2afa  GeForce GT 730A
		103c 2b04  GeForce GT 730A
		1043 13ad  GeForce GT 730M
		1043 13cd  GeForce GT 730M
	1291  GK208M [GeForce GT 735M]
	1292  GK208M [GeForce GT 740M]
	1293  GK208M [GeForce GT 730M]
	1294  GK208M [GeForce GT 740M]
	12a0  GK208

Unfortunately it seems no one has run GPU-Z on a discrete GK208 in the wild as there is no record in the GPU-Z database.

Leave it to NVIDIA to make one card name spawn 3 different compute capabilities… sigh incredibly confusing. That being said, it’s nice that dynamic parallelism is coming to new and cheap cards!

I just ordered a Lenovo Y410p, although it seems that the GT750m is a GK107 chip, at least according to I’m going to give it a trial run regardless… Lenovo has a 30 day no questions asked return policy ;)

To add to the discussion, here are 2 more desktop GT630 cards by Zotac that should be GK208 based, given the core count of 384. Both are available on NewEgg.

[url][/url] - 1 GB version
[url][/url] - 2 GB version

That Zotac GT 630 is also rated at max 25 watt… Extremely good performance / watt! I understand why NV has used the GK208 for laptops!

That being said, does anyone know of any 14" laptops with GK208?

Answering my own question for now (albeit for 15"):
[url]TechnologyGuide - TechTarget

Seems like HP ENVY 15t-j000 has a GK208 according to the post above. Link for sale:

The one above is either 1366x768 or 1980x1080 (in my opinion too high of a resolution for a 15.6").
I can’t advocate HP laptops because they tend to whitelist their WiFI cards and I already have an Intel 7260 Dual Band AC card that I intend to use.

From browsing a bit from news on Computex, Acer is releasing the S3-392, perhaps sometime in July? that sports a 1080p touchscreen in a 13.3" form factor, with a GT735m (GK208) chip:

There is also the VAIO Fit 14 (1600x900, Ivy Bridge) which can be configured with a GT735m chipset:
For what it’s worth, I dropped by a Sony Store the other day and inquired about what SSD choices would be included for the models that are configurable with a SSD – apparently they are a proprietary interface and according to the tech the motherboard does not have a regular 2.5" slot. I stand to believe it given the model name of the hard drive as reported by device manager was a Samsung based SSD that did not show up on Google. The model with the “(5400rpm) + 8GB SSD hybrid hard drive” is a Toshiba MQ01ABD075H – [url]EMEA Region – Toshiba Storage Solutions (9.5mm height), so that gives plenty of options for upgrades (7.5mm w/ spacer and 9.5mm SSDs).

Also from Sony, the VAIO Fit 14E (1600x900, Ivy Bridge) can be configured with a GT740m (1 or 2GB VRAM), however it’s unclear if that model is using the GK208 chipset – presumably it is, given I cannot find information about the model number anywhere, and Sony chat support mentions it is a model that has ‘not released’ yet, implying it is new and most likely will be using GK208 instead of GK107, but buyer beware!

To me it doesn’t make sense to me to upgrade just the video card, ideally I’d like a GK208 in a 14" form factor, either 1366x768 or 1600x900 with a Haswell (4th Gen) Core i5 or i7 processor, but nothing like that exists so far.

Edit 1: I’ve confirmed another laptop model with GK208. This one is an Asus VivoBook S551LB – [url][/url] The author of the extensive review mentions it is sold in the US as the Asus Vivobook V551LB. It is available on Amazon and a few other retailers – or are the ideal choice, as they offer 15-day and 45-day return policy by default with no restocking fees. After trying it out from Best Buy, besides losing VT-d, the trackpad is an Elantech one and it jumps pretty bad, regardless of what drivers I used. Too bad, because otherwise the laptop was pretty decent, but it’s going back because of that issue alone. The screen is actually decent enough, despite the poor viewing angles, but I didn’t see a problem with glare given how the reviewer mentioned it was quite reflective.

Edit 2: Another laptop that has GK208 is the TOSHIBA Satellite S55-A5279 – [url][/url] and probably also the S55-A5276 – [url]Computer Parts, PC Components, Laptop Computers, LED LCD TV, Digital Cameras and more - I ordered the S55-A5279 from Rakuten, because of the generous 45-day return policy and so far the only gripe I have is the short battery life given the battery specs are 14.4V, 2838 mAh. It’s Identified as a PEGA G71C000FP110 by BatteryInfoView software. Other than that, it seems to meet my expectations. The fan does get a bit loud when you push the CPU, but that’s normal for pretty much any laptop.

Compared to the VivoBook, this Toshiba S55 is thicker, and all plastic construction vs an aluminum top on the Vivobook. That being said, under normal browsing it keeps very cool – CPUID HWMonitor sees about 4-5W of power use on the processor as I type this on the S55. The Toshiba’s 4-core (8-thread) i7-4700MQ processor is not VT-d capable, however it might be upgradeable down the line, as Intel does ship i7-4800MQ and i7-4900MQ processors as boxed units.

The Toshiba also has a VGA (RGB) port in addition to HDMI, which means it should be able to drive 2 monitors natively, but I will have to check this soon. On that note, I also want to see if I’m able to drive a 2560x1440 or 2560x1600 resolution via the HDMI port.

A plus I saw on the Asus vs the Toshiba were a much better battery life – The Asus has 3 cell, 11.1V, 4500 mAh, 50 Wh battery, vs the S55’s 4 cell, 14.4V, 2838 mAh, 43Wh battery. For that matter, the max TDP on the Toshiba’s 4700MQ is 47W vs 15W for the Asus’ 4500U. The Asus has a 65W power brick – 19 VDC @ 3.42 A, vs 120W on the Toshiba – 19 VDC @ 6.32A, so the Toshiba definitely can draw a lot more power due to the beefy processor. Needless to say, the Asus beats Toshiba in battery life.

My GK208-based GT 640 arrived today (apparently I’m super close to a Newegg warehouse)! After upgrading my Ubuntu 12.04 x86_64 system to the CUDA 5.5 RC, I get the following results:


Device 1: "GeForce GT 640"
  CUDA Driver Version / Runtime Version          5.5 / 5.5
  CUDA Capability Major/Minor version number:    3.5
  Total amount of global memory:                 1023 MBytes (1073020928 bytes)
  ( 2) Multiprocessors x (192) CUDA Cores/MP:    384 CUDA Cores
  GPU Clock rate:                                1046 MHz (1.05 GHz)
  Memory Clock rate:                             2505 Mhz
  Memory Bus Width:                              64-bit
  L2 Cache Size:                                 524288 bytes
  Max Texture Dimension Size (x,y,z)             1D=(65536), 2D=(65536,65536), 3D=(4096,4096,4096)
  Max Layered Texture Size (dim) x layers        1D=(16384) x 2048, 2D=(16384,16384) x 2048
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Maximum sizes of each dimension of a block:    1024 x 1024 x 64
  Maximum sizes of each dimension of a grid:     2147483647 x 65535 x 65535
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Bus ID / PCI location ID:           4 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

bandwidthTest (my motherboard is PCI-E 2.0)

Device 1: GeForce GT 640
 Quick Mode

 Host to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)	Bandwidth(MB/s)
   33554432			3184.9

 Device to Host Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)	Bandwidth(MB/s)
   33554432			3198.7

 Device to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)	Bandwidth(MB/s)
   33554432			32036.5

… and the various dynamic parallelism demos (cdp* in the bin release directory) work too.

I am very surprised by the very low host/device memory bandwidth. That looks suspiciously like PCI-E 1.0 or PCI-E 2.0 with an x8 connection. I need to see if there is a good way to tell what PCI-E link settings were negotiated at bootup…

Update: Although lspci -vv is reporting some strange information (like wrong link rates for cards I can verify are going at full PCI-E 2.0 speeds), it does seem to indicate that this card negotiated an x8 link with the host. This workstation should be able to do x16 on the slot I used, so I’ll need to investigate what’s going on.

Update to the update: I just noticed that earlier in the thread allanmac mentioned rumors of these cards being PCI-E 2.0 x8. Although the ASUS card I bought doesn’t explicitly say either way in the documentation I found (boo!), a very similar Gigabyte GT 640 does list the card as being PCI-E 2.0 x8. So, I think this is a real “feature” of these GK208 desktop cards.

Thanks Seibert! Very interesting! Really kind of seems to me that they’ve been cutting the D2H & H2D bandwidth in a potential effort to save on power consumption?

Ha! Great job! That confirms a lot.

It’s cool that you can now get sm_35 in a 2 SMX card.

The last remaining question is whether it can negotiate a PCIe 3.0 x8 connection?

Also, it’s probably too late since it’s inside your case but you should be able to inspect the PCIe fingers and see if there are actually traces to all of the lanes. :)

It sounds like you’re not on Windows but GPU-Z does a good job of detecting the current PCIe rate. You can actually see GPUs downclock to PCIe 1.1 when idle. For this reason, prodding your card with a kernel before looking at its PCIe speed is recommended. :)

Ah! That was it. If I run some CUDA application on the GPU in the other window, lspci reports the correct transfer rates. It’s still x8, but at least it does clock up to 5 GT/s when I’m using it.

I agree that the limited PCI-E could be a strategy to reduce power consumption. This card is low-profile compatible (after removing the VGA port), though they don’t give you the low profile faceplate. In addition, the GT 640 is probably the nicest CUDA device I’ve seen that doesn’t require any PCI-Express power connectors.

Sadly, I just took inventory around here, and the only PCI-Express 3.0 motherboard we have is one of our semi-critical disk servers. The grad students would be slightly displeased if I took it offline to go joy-riding with a GPU. :)

Sounds like this would be a perfect test of PCIe Hot Plug! <kidding!>

When I looked a few days ago there might a ZOTAC board with 2GB of GDDR5 but it’s really tough to tell if it’s GK208. It was just released though and has the indicative MHz.

I ordered the Zotac GT630 to give it a whirl myself. If nothing else, I can get rid of a GK107 GT640 2GB I’m not using anymore to offset the cost. I should have it on Monday and I can try it on my PCI-E 3.0 motherboard.