Help using single GPU among multithreaded CPU

I have a CPU split into 8 threads. I want to split up the resources of my GPU evenly among each thread. I have a GPU capable of having 65536 blocks. I would like to have 8000 blocks set aside for each thread of my CPU.

What is the best way to go about this? My algorithm knows which CPU thread it is on. I have two ideas for how to do this but don’t know which if either would work.

  1. Is it possible I could specify which blocks to use on my GPU? i.e. 0-7999 for CPUThread 1, 8000-15999 for CPUThread 2…

  2. Is it possible for the different CPU threads to each call the GPU kernel method individually (but in parallel)? Example below:

global void kernel( float* a, float *b, float *c, int *CPUThreadIndex )
{
if ((CPUThreadIndex * 8000) <= blockIdx.x && blockIdx.x < ((CPUThreadIndex + 1) * 8000))
{
//Do stuff on the kernel
}
}

void main(float a, float b, int CPUThreadIndex)
{

kernel<<<64000, 1>>>( dev_a, dev_b, dev_c, dev_CPUThreadIndex);

}

If each seperate CPU thread goes to the main function around the same time would something like the method above work in parallel?

Thanks for any help you guys can provide.

Do threads belong to the same CUDA Context?

I have worked with multi CPU thread using all the same GPU, and i have found out these aspects:

  1. with CUDA 5 and Kepler, more streams belonging to the same CUDA Context can share GPU resources “at the same time”.

  2. if you want to share GPU between different thread, you have to create one CUDA Context for each CPU thread, but only one context at a time use the GPU.

I’m sure i haven’t been so clear, so someone more expert can help us.

In CUDA C programming guide p.61:

Default compute mode: Multiple host threads can use the device (by calling cudaSetDevice() on this device, when using the runtime API, or by making current a context associated to the device, when using the driver API) at the same time.

I think m_colaprico is right. You can use cudaSetDevice() at each CPU thread for sharing the same GPU. But the kernel may be not executed in parallel.

Thanks tzuhung!
Moreover, i have learned you can execute in parallel multiple CPU thread on the same GPU only on a TESLA Kepler, using HyperQ with MPI task.

I will try if it is possible also with OpenMP CPU thread.

Does this really work?