GT650M a Kepler part?

Is the nVidia GT650M used in the new Macbook Pros a Kepler device? Or is it some older Fermi model.

Christian

Update: I found a table on an nVidia site which lists the device as being a Compute 3.0 part. Still if someone could post a deviceQuery for this chip, I’d be grateful.

Yes it seems to be Kepler,

“Based on the next-generation NVIDIA Kepler graphics architecture, the GeForce GT 650M offers unprecedented performance and extreme energy efficiency, giving it the muscle to process the 5,184,000 pixels in the next-gen MacBook Pro’s ultra high-resolution display. The GeForce GT 650M is not only up to the task, it maximizes power efficiency along the way.”

http://benchmarkreviews.com/index.php?option=com_content&task=view&id=19066&Itemid=99999999

Found 1 CUDA Capable device(s)

Device 0: "GeForce GT 650M"

  CUDA Driver Version / Runtime Version          4.2 / 4.2

  CUDA Capability Major/Minor version number:    3.0

  Total amount of global memory:                 2048 MBytes (2147483648 bytes)

  ( 2) Multiprocessors x (192) CUDA Cores/MP:    384 CUDA Cores

  GPU Clock rate:                                405 MHz (0.41 GHz)

  Memory Clock rate:                             2000 Mhz

  Memory Bus Width:                              128-bit

  L2 Cache Size:                                 524288 bytes

  Max Texture Dimension Size (x,y,z)             1D=(65536), 2D=(65536,65536), 3

D=(4096,4096,4096)

  Max Layered Texture Size (dim) x layers        1D=(16384) x 2048, 2D=(16384,16

384) x 2048

  Total amount of constant memory:               65536 bytes

  Total amount of shared memory per block:       49152 bytes

  Total number of registers available per block: 65536

  Warp size:                                     32

  Maximum number of threads per multiprocessor:  2048

  Maximum number of threads per block:           1024

  Maximum sizes of each dimension of a block:    1024 x 1024 x 64

  Maximum sizes of each dimension of a grid:     2147483647 x 65535 x 65535

  Maximum memory pitch:                          2147483647 bytes

  Texture alignment:                             512 bytes

  Concurrent copy and execution:                 Yes with 1 copy engine(s)

  Run time limit on kernels:                     Yes

  Integrated GPU sharing Host Memory:            No

  Support host page-locked memory mapping:       Yes

  Concurrent kernel execution:                   Yes

  Alignment requirement for Surfaces:            Yes

  Device has ECC support enabled:                No

  Device is using TCC driver mode:               No

  Device supports Unified Addressing (UVA):      No

  Device PCI Bus ID / PCI location ID:           1 / 0

  Compute Mode:

     < Default (multiple host threads can use ::cudaSetDevice() with device simu

ltaneously) >

Thanks! We are considering shipping our simulation environment on a laptop with this GPU, however what worries me is the narrow 128 bit memory interface. I am concerned about the achievable memory bandwidth and whether it would be limiting to our application.

No problem External Image

Here is a recent forum post where I achieve 76% utilization on this laptop : The Official NVIDIA Forums | NVIDIA

It seems to be generally a bit harder to achieve a high bandwidth utilization on Kepler compared to Fermi, I also attribute this to the thin memory interfaces.