I have a problem concerning memory allocation on the card. my main program allocates memory on the card for my input data. it also copies the input data to the cards memory. the pointer I got from cudaMalloc (in the main program) is then passed as a parameter to a host-thread that starts the kernel on the device. the problem is, that the kernel reports an “invalid device pointer”. if I move the cudaMalloc stuff from the main program to the thread that launches the kernel, everything is fine.
are there any restrictions concerning cudaMalloc and different threads?
I assume that it is not possible. in my tests it doesn’t works. it seems, that every thread has its own, lets call it “cuda context”. in one thread I used cuMemGetInfo() to get the memory information. the result was that my card (gtx 8800) has 18MB but I was able to allocate 128MB with cudaMalloc(). nice ;-)
A CUDA context is like a CPU process, in that each has its own state, memory space, etc. There is a 1-to-1 correspondence between CPU threads/processes and CUDA contexts, so you cannot have multiple threads per context, and you cannot have multiple contexts per threads.
This means that two host threads cannot see each other’s CUDA arrays/structures.