video cards in parallel ? how the use of various video cards in parallel?

RA_bsonFisem · July 15, 2011, 12:00am

how the use of various video cards in parallel?

Skybuck · July 15, 2011, 3:04am

With cuda driver api this is possible, I am not sure if c/c++/cuda c/c++ has anything special for it.

It basically comes down to using multiple devices, multiple contexts, one context per device.

Either you use one cpu thread and switch between contexts or you could multi-thread it and have one context per thread. (The cuda driver api is supposed to be thread-safe I think… not sure, haven’t tested that yet External Image External Image

So suppose on thread then:

EnterContext // probably push api

// use device, call api’s, launch kernels, etc, it will all be done on the current context.

LeaveContext // probably pop api

EnterAnotherContext

// use other device

LeaveAnotherContext

The difficulty is probably with dividing work among devices, this is probably problem/algorithm specific.

It might be easier if kernels would launch over multiple devices.

Perhaps a deviceIdx and deviceDim could be added, perhaps this already exists, but I haven’t seen it, and it’s not in document so it probably don’t exist.

Devices probably also have their own memory space, but a solution might be unification/unified addresses, which is a new feature only available on tesla cards…

So “unified addressess” are probably a thing for the future for consumer cards if it ever gets that functionality.

I think heat could be issue, so I am not betting on a multi-device future for now External Image :)

Sarnath · July 15, 2011, 6:47am

You need to visit the good old multi-GPU thread in which Mr.Anderson (Mr. Then : Dr. now) showed a beautiful way to multi-GPU programming by using multiple-threads.
Although CUDA 4.0 brings in so many new stuff, the old thread is still relevant and elegant.
http://forums.nvidia.com/index.php?showtopic=66598&st=0

Best REgards,
Sarnath

Skybuck · July 15, 2011, 8:57pm

Code not available from that thread.

Elegant is subjective here… if you want to be able to simply call api calls for any context then yes it would be elegant but also overheadish… all the binds are probably pushing/popping contexts all the time…

That was my initial idea too, but it gets annoying after a while, for two reasons:

Either the api wrappers have to do bind-related-stuff which makes their design more complex and also more overheadish.

or

Saddle the user with all that stuff: gpu0.call(bind(…))

^ Having to write that for each call is kinda annoying… and not my goal of my framework, which is low overhead and productivity/less typing.

Also for me it seems best to start with low level api and my recommendation for now would be, code for one device at a time.

Also if it’s multi-threaded then binding all the time is ofcourse not needed, just once like in my example above.

Finally if programmer does want to call api’s for any device, he/she could also wrap the context stuff himself/herself External Image

Skybuck · July 15, 2011, 9:00pm

I even went as far as include critical sections into enter context/remove context… but I ended up removing that… critical sections not needed for single threaded programming, and it also made the lower level driver api wrapper too complex for my taste…

For now I don’t think such features are needed, I also like to give programmer of framework more control over his own critical sections… you wanna do multi threading ? → get your own critical sections and secure it lol :) Not so difficult… my device object could provide a critical section just so it doesn’t need to be created… but even creation code not that much…

High level frameworks could be created which deal with all of that though… but then everything becomes a bit more vague…

tmurray · July 15, 2011, 9:13pm

CUDA 4.0 handles all of this for you now anyway. Contexts are thread-safe and can be used by multiple threads at the same time.

Skybuck · July 15, 2011, 10:26pm

Perhaps you mean “cuda runtime api 4.0”.

As far as I know the “cuda driver api 4.0” still requires manual use of context switching/threading and all of that.

I have seen no documentation which would make me believe otherwise External Image

Skybuck · July 15, 2011, 10:33pm

Euhm… I just looked at my code again… for something else… the critical sections are still inside the cuda context wrappers… which is probably a good thing… this allows multi threading and multi device support in one go.

I did remove the context stuff from module loading and such… that’s where I thought some complexity was unnecessary.

So enter context, load modules, leave context.

It’s nice to have one entry point and one exit point for context… so that all other api’s and frameworks can simply use that a context is present… no further switching required…

Topic		Replies	Views
CUDA,Context and Threading CUDA Programming and Performance	6	19499	May 29, 2012
CUDA 4.0 Context Sharing by Threads Impact on existing Multi-threaded Apps CUDA Programming and Performance	8	22911	March 9, 2011
Multiple GPUs, multiple applications CUDA Programming and Performance	10	10012	April 22, 2009
2 or more graphics cards on the same mainboard CUDA Programming and Performance	10	9674	October 23, 2008
Is it possible using muliple context for a GPU. mulitple CPU thread CUDA Programming and Performance	10	4863	April 8, 2009
IDEA: Intrinsic multi-GPU support (Even over a network) CUDA Programming and Performance	7	9592	January 1, 2009
Support for multi-threaded apps on cuda and multiple applications on cuda CUDA Programming and Performance	13	12734	January 24, 2011
Multi-GPU with a single thread and driver API? CUDA Programming and Performance	5	4988	July 25, 2008
cuda with multicore (multitasking) multicore CPU(for multitasking) and CUDA CUDA Programming and Performance	13	12030	February 23, 2009
CUDA multiple contexts CUDA Programming and Performance	0	5487	April 19, 2007

video cards in parallel ? how the use of various video cards in parallel?

Related topics