I’d like to design a CUDA processing system which can handle single or multiple threads from single node (context) , to single or multiple GPU devices.
Each thread (ie a task) uses a single GPU for CUDA calculations. Each GPU is a resource which shall be reserved for a single task at a any given time.
Currently I have multiple threads from a single process that compete on a single GPU resource, this causes an “unspecified launch failure” error message.
I’d like to queue the work according to an unoccupied GPU device, or wait until the device is ready to schedule a new task.
I’m very certain that I need a manager who can decide which task can access a certain device, put on hold, schedule work etc.
Here are my questions:
What is the best way to design such a system.
Shall I integrate my application using a 3rd party scheduling mechanism?
Is there a supported library for such case.
Is there a related CUDA product for such case? physx has a CUDA scheduler
Will the design support future multiple context scheduling system, using the same methodology.
Related posts:
I have notice TORQUE Resource Manager for multiple process (context) scheduling in the following post. This doesn’t seems to be the case that I’m looking for.
Likewise I have notice some related products like array fire multi GPU
I’m somehow very confused, and doesn’t really know where to start, I’m pretty sure that Nvidia has already encounter the problem. Any help would be very appreciated.