CUDA on Nvidia ION under Linux Does it work?

tR1 · May 4, 2010, 4:44pm

I have two desktop machines (GTX 260 & 9500 GT) to work with CUDA under Linux (numerical simulations), but very often i need to work on the way, so i’ve decided to buy a netbook - one of new machines with ION. HP mini 311, particularly. So, the questions is: does CUDA on ION/ION LE works smooth under Linux (OpenSUSE 11.x)?

Any useful info is appreciated:)

Tom_Milledge · May 4, 2010, 5:55pm

Yes, it does work. Here is the SDK deviceQuery output for the Geforce 8200, the ION precursor for AMD boards, on a CentOS 5.4 GPU compute node:

[root@bdgpu-n01 ~]# /usr/local/cuda_sdk/C/bin/linux/release/deviceQuery

CUDA Device Query (Runtime API) version (CUDART static linking)

There are 2 devices supporting CUDA

Device 0: "GeForce GTX 275"

  CUDA Driver Version:						   2.30

  CUDA Runtime Version:						  2.30

  CUDA Capability Major revision number:		 1

  CUDA Capability Minor revision number:		 3

  Total amount of global memory:				 939261952 bytes

  Number of multiprocessors:					 30

  Number of cores:							   240

  Total amount of constant memory:			   65536 bytes

  Total amount of shared memory per block:	   16384 bytes

  Total number of registers available per block: 16384

  Warp size:									 32

  Maximum number of threads per block:		   512

  Maximum sizes of each dimension of a block:	512 x 512 x 64

  Maximum sizes of each dimension of a grid:	 65535 x 65535 x 1

  Maximum memory pitch:						  262144 bytes

  Texture alignment:							 256 bytes

  Clock rate:									1.48 GHz

  Concurrent copy and execution:				 Yes

  Run time limit on kernels:					 No

  Integrated:									No

  Support host page-locked memory mapping:	   Yes

  Compute mode:								  Default (multiple host threads	can use this device simultaneously)

Device 1: "GeForce 8200"

  CUDA Driver Version:						   2.30

  CUDA Runtime Version:						  2.30

  CUDA Capability Major revision number:		 1

  CUDA Capability Minor revision number:		 1

  Total amount of global memory:				 265617408 bytes

  Number of multiprocessors:					 1

  Number of cores:							   8

  Total amount of constant memory:			   65536 bytes

  Total amount of shared memory per block:	   16384 bytes

  Total number of registers available per block: 8192

  Warp size:									 32

  Maximum number of threads per block:		   512

  Maximum sizes of each dimension of a block:	512 x 512 x 64

  Maximum sizes of each dimension of a grid:	 65535 x 65535 x 1

  Maximum memory pitch:						  262144 bytes

  Texture alignment:							 256 bytes

  Clock rate:									1.20 GHz

  Concurrent copy and execution:				 No

  Run time limit on kernels:					 No

  Integrated:									Yes

  Support host page-locked memory mapping:	   Yes

  Compute mode:								  Default (multiple host threads	can use this device simultaneously)

Test PASSED

Press ENTER to exit...

By default, CUDA apps pick the GTX 275.

And here are the SDK nbody -benchmark numbers (a whopping 11.543 GFLOP/s !):

[root@bdgpu-n01 ~]# /usr/local/cuda_sdk/C/bin/linux/release/nbody -benchmark --device=1

Run "nbody -benchmark [-n=<numBodies>]" to measure perfomance.

Using device 1: GeForce 8200

1024 bodies, total time for 100 iterations: 181.675 ms

= 0.577 billion interactions per second

= 11.543 GFLOP/s at 20 flops per interaction

Also, here are the 8200 bandwidthTest numbers (with those of the GTX 275 for comparison):

[root@bdgpu-n01 ~]# /usr/local/cuda_sdk/C/bin/linux/release/bandwidthTest --memory=pinned --device=1

Running on......

	  device 1:GeForce 8200

Quick Mode

Host to Device Bandwidth for Pinned memory

.

Transfer Size (Bytes)   Bandwidth(MB/s)

 33554432			   2261.8

Quick Mode

Device to Host Bandwidth for Pinned memory

.

Transfer Size (Bytes)   Bandwidth(MB/s)

 33554432			   2259.9

Quick Mode

Device to Device Bandwidth

.

Transfer Size (Bytes)   Bandwidth(MB/s)

 33554432			   4327.3

&&&& Test PASSED

Press ENTER to exit...

[root@bdgpu-n01 ~]# /usr/local/cuda_sdk/C/bin/linux/release/bandwidthTest --memory=pinned --device=0

Running on......

	  device 0:GeForce GTX 275

Quick Mode

Host to Device Bandwidth for Pinned memory

.

Transfer Size (Bytes)   Bandwidth(MB/s)

 33554432			   2714.1

Quick Mode

Device to Host Bandwidth for Pinned memory

.

Transfer Size (Bytes)   Bandwidth(MB/s)

 33554432			   2864.6

Quick Mode

Device to Device Bandwidth

.

Transfer Size (Bytes)   Bandwidth(MB/s)

 33554432			   107162.8

&&&& Test PASSED

Press ENTER to exit...

The internal memory bandwidth of 4.3 GB/s is, shall we say, on the low side for a CUDA-capable device, but the host-to-device and device-to-host rates (pinned memory) are almost the same as for the GTX275. (Although off-topic, it should be noted that these are 2U rackmount machines and that the GTX275s are connected with a flexible riser. The device-to-host number doubles when the GT200 GPU is either directly mounted in the PCIex16 2.0 slot or connected with a rigid riser. Unfortunately the flexible riser was necessary to fit the GPU in the 2U box.) Anyway, the ION is compute capability 1.1. Have fun!

jma · May 5, 2010, 8:16am

Then rephrase your question to cover ION2 instead, which - as far as I can see - is also compute1.2 rather than 1.1

http://tech.icrontic.com/news/nvidia-unveils-ion-2-platform/

tR1 · May 5, 2010, 6:20pm

Thank a lot! I am, actually, not sure, that 9400M is good because 8200 was good, but that the argument, nevertheless.

The only thing i do not understand - the differences between ION and ION LE. Is it right, that when working under Linux - there is no differences?

Tom_Milledge · May 5, 2010, 10:15pm

I hesitate to call the performance of the Geforce 8200 (or any other IGP) “good”. If asked to choose, I would probably go with “bad”, but maybe that is unfair since there are probably whole classes of CUDA applications (yet unwritten?) that would be a good match for the IGP.

I am not an expert on IONs, however the lower powered version of the ION LE has the same number of cores (8) as the 8200, and appears to have equivalent specs. YMMV

Smokey · May 6, 2010, 2:51am

There are a lot of real time signal processing applications that manage to fit in a single MP quite fine (and a standard ION (not LE) has 2…), and aren’t bandwidth limited… in these cases you’re purely clock limited, and the clock of an ION is quite close to any high end geforce card - in these cases the ION is pretty damn fine (2-8x the FLOPs of various low power VIA/Atom/ARM processors)…

Smokey · May 6, 2010, 2:58am

Oh, and then you have tricks for the ION like dual-purpose kernels, which execute different code sets (‘virtual kernels’) on each MP, effectively getting a poor mans asynchronous kernel execution (with varying kernels) over the 2 MPs)…

Tom_Milledge · May 6, 2010, 1:12pm

I think this is a good point. As you imply, the performance of Atom/VIA/ARM processors is also “bad” in an HPC context, but not all computing is HPC.

tR1 · May 16, 2010, 2:38pm

So, my HP Mini 311 is here. CUDA works perfect (but slow, of course:)) under openSUSE 11.2. Here is deviceQuery output:

CUDA Device Query (Runtime API) version (CUDART static linking)

There is 1 device supporting CUDA

Device 0: "ION LE"

  CUDA Driver Version:						   3.0

  CUDA Runtime Version:						  3.0

  CUDA Capability Major revision number:		 1

  CUDA Capability Minor revision number:		 1

  Total amount of global memory:				 131792896 bytes

  Number of multiprocessors:					 2

  Number of cores:							   16

  Total amount of constant memory:			   65536 bytes

  Total amount of shared memory per block:	   16384 bytes

  Total number of registers available per block: 8192

  Warp size:									 32

  Maximum number of threads per block:		   512

  Maximum sizes of each dimension of a block:	512 x 512 x 64

  Maximum sizes of each dimension of a grid:	 65535 x 65535 x 1

  Maximum memory pitch:						  2147483647 bytes

  Texture alignment:							 256 bytes

  Clock rate:									1.10 GHz

  Concurrent copy and execution:				 No

  Run time limit on kernels:					 Yes

  Integrated:									Yes

  Support host page-locked memory mapping:	   Yes

  Compute mode:								  Default (multiple host threads can use this device simultaneously)

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 134575159, CUDA Runtime Version = 3.0, NumDevs = 1, Device = ION LE

Default compiler in 11.2 is GCC 4.4, which is not supported. Thus, to get it working, do the following:

install GCC 4.3
mkdir /opt/gcc43; ln -s /usr/bin/gcc-4.3 /opt/gcc43/gcc; ln -s /usr/bin/g+±4.3 /opt/gcc43/g++
in nvcc.profile add line “compiler-bindir=/opt/gcc43”

Screenshot.png1366×768 167 KB

Topic		Replies	Views
GPU+CUDA cards on Fedora Core 8 Which cards work? CUDA Programming and Performance	10	12064	August 22, 2008
ION gpu spec wrt. CUDA CUDA Programming and Performance	15	21665	August 21, 2009
CUDA computing on NVIDIA ION CUDA Programming and Performance	10	10748	October 12, 2009
most effective way to get a mobile CUDA gpu CUDA Programming and Performance	24	7517	September 29, 2008
CUDA test performance issue CUDA Programming and Performance	7	1446	November 24, 2014
Is anybody using CUDA on laptops with fermi GPUs with Optimus Technology? CUDA Programming and Performance	15	23056	November 14, 2010
CUDA and openCL support for multiple GPU/PCI devices? CUDA Programming and Performance	7	5366	April 11, 2009
Thread for CUDA-capable laptop reviews Post performance and compatibility reviews here CUDA Programming and Performance	7	4830	June 26, 2009
trying to get cuda working on my netbook with ION2 chipset CUDA Setup and Installation	12	2684	August 6, 2013
Will CUDA work with FreeBSD? CUDA Programming and Performance	21	43271	May 9, 2010

CUDA on Nvidia ION under Linux Does it work?

Related topics