cudaMalloc and threads "invalid device pointer" error

hi all,

I have a problem concerning memory allocation on the card. my main program allocates memory on the card for my input data. it also copies the input data to the cards memory. the pointer I got from cudaMalloc (in the main program) is then passed as a parameter to a host-thread that starts the kernel on the device. the problem is, that the kernel reports an “invalid device pointer”. if I move the cudaMalloc stuff from the main program to the thread that launches the kernel, everything is fine.
are there any restrictions concerning cudaMalloc and different threads?

thanks in advance and best regards,

I have the same question.

Is it possible to share the device memory between host threads? even between host processes?

If cudaMalloc return a pure pointer, it should be ok if the host threads or processes use the same board.

Any idea or advice, please.

I assume that it is not possible. in my tests it doesn’t works. it seems, that every thread has its own, lets call it “cuda context”. in one thread I used cuMemGetInfo() to get the memory information. the result was that my card (gtx 8800) has 18MB but I was able to allocate 128MB with cudaMalloc(). nice ;-)

Hello all,

A CUDA context is like a CPU process, in that each has its own state, memory space, etc. There is a 1-to-1 correspondence between CPU threads/processes and CUDA contexts, so you cannot have multiple threads per context, and you cannot have multiple contexts per threads.

This means that two host threads cannot see each other’s CUDA arrays/structures.

Thank you for your explanation.

Then, my question is: how to share device memory between host threads or host process.

Of course, we can copy the GPU memory into host RAM by one thread or process, and reload it into GPU by another thread or process.

However, I do not think this is a good way.

On Microsoft Window, sharing memory between processes is possible through memory map. I hope CUDA has something like this (maybe in the future).