Quick test wanted on new MachbookPro

I’d like to ask a 1-minute favor from someone on the forum who has a new Macbook Pro with the dual GPUs.
Could you run the SDK’s BandwidthTest application and post the results?

I am very interested in the transfer speeds for both the 9400M and 9600M. I wonder if the 9400M may actually be faster for transfers because of its embedded design!

Thanks very much if you could help me out!

Edit: Sigh, I would have to make a typo in the title topic, and it doesn’t look like you can edit those. So please enjoy teasing me about my misplaced interest in Machbooks and not MacBooks.

Don’t think this is testable yet–isn’t only the 9600M visible from Windows? There’s no 2.2 beta for OS X.

Also, it doesn’t matter what the 9400M’s bandwidth is because you should never ever be performing memcpys on the 9400M.

Shouldn’t both GPUs be visible to OSX even in CUDA 2.1?

And for the 9400M, I understand you’d never use memcpy in CUDA 2.2, but in 2.1 and earlier there’s no choice.

Ok I’ve attached a couple of files giving bandwidth for both devices (shmoo mode) with pinned memory. Not had a chance to study them. I’d be interested in seeing some graphs (and any variations) FYI this is MacBook Pro 17" with new firmware update applied - and CUDA OSX 2.0 be interesting to see if 2.2 brings any speed ups. BTW with double the number of cores and nearly double the clock rate I think I have my money on the 9600M for doing work.
test1.txt (4.77 KB)
test0.txt (4.84 KB)

simon, thanks for those runs!

The 9600M GT bandwidth is only about 15GB/sec… I expected more from its specs.

The 9400M speeds are also interesting. I think they’re mostly a measurement of the machine’s own RAM bandwidth.

the limiter is the pcie bus - not even CPU/RAM - remember the trick with GPU is once you have the job on the device keep it there… you can upload new kernels - but keep the data on the device - its the internals of the GPU that do the fast stuff… and that’s just where the story starts: check out the CUDA lectures online, the device itself has a fairly complex memory architecture that you really must understand to do any kernel development. You’ll soon forget about that PCIe bus… check out the device-device bandwidth for comparison - if you intend to rely on host-device round trips you’ll be missing the trick here… have fun! :)

Cheers,

simon