Running CUDA on multicore system.

Hi all.

I have some questions about running CUDA on multicore system (Intel Core 2 Duo with WinXP in my case). Unfortunately I failed to find smth about this topic in programming guide.

While running several threads on multicore system (i.e. on host)

  1. Can I invoke CUDA methods (such as cudaMemcpy() or kernel invocation) from different host threads? What is result of such invokations?

  2. Should this calls be synchronized (i.e. can there be some problems if I accidentally invoke cudaMemcpy() from different threads simultaneously?)?.

  3. While running CUDA in one thread (this questions relates to one-core systems too) what is result of such invokation:

kernel1<<< Dg, Db, Ns >>>(parameter);

kernel2<<< Dg, Db, Ns >>>(parameter);

Is it guaranteed that kernel1 will e executed after kernel2 (I have some doubts since kernel invokations are declared to be asynchronous)?

Thank you.

You cannot call CUDA methods from different threads, unless each thread is accessing a different device. If your application has multiple threads, then you will probably want to make a special CUDA service thread, and that thread will be the only one to call CUDA methods.

Yes, kernel2 will be executed after kernel1.

When you launch kernel2 your host thread will block until kernel1 is finished executing. When kernel1 is done kernel2 will be launched and control will be returned to your thread.

It is like calling cudaThreadSynchronize() before launching second kernel.

BTW, cudaMemcpy() between host and device memory have similar behaviour: it blocks until kernel completes.

i see. thank you