Run same CUDA kernel from two different host threads, with different data

Hello,

I am using CUDA4 with a NVidia Quadra 1000M

I also have a GTX 580 card that I would like to try this on:

I have some operations that can be performed independently with different data set. I want to spawn two pthreads, that can each call my CUDA code with their different data. What do I need to do in those pthreads to allow this to happen?

Will the calls to the CUDA code block on each other, or happen concurrently?

Presently, I am only calling “cudaSetDevice(0)” and it seems like one thread is executing the kernels correctly, and the other other thread gets an error right away.

Any help is appreciated