I am trying to do stuff with pinned, portable memory. (Cuda runtime 2.3)
The example I am working from allocates pinned memory (with cudaHostAlloc((void**)&addr, n_bytes, cudaHostAllocPortable) )
in the main thread, then launches separate host threads for each GPU.
That is OK. But I get problems when I try to allocate pinned/portable memory within each host thread. So:
A) is it OK to allocate/free pinned memory within several different host threads? (And what if each had a different CUDA context?)
B) is it OK for one thread to allocate, and another thread to free a chunk of pinned/portable memory?
B. may seem like a strange thing to do. Basically I have a class that caches arrays, up to a global limit on the cached data.
Cached data is freed up when the limit is reached. Cached data may be used by different threads, and any thread may do the freeing.
(Users of the data must always check to see if it has been freed up and if so recalculate the array. But if it is still there,
time is saved). Anyway, this is all made thread safe with mutexes, and works fine as long as I use “new” and “delete”.
It passes its unit tests.
Then I thought, why waste time and space copying that data into pinned memory before copying to the device? Why not
put it in pinned memory to start with? So I replaced “new” with “cudaHostAlloc(… cudaHostAllocPortable)” and “delete”
And now it passes all the unit tests apart from the one that exercises thread safety, where it crashes (seg faults) randomly.
NB in this unit test I am not even copying to the device - just repeatedly allocating, writing, reading and freeing, in
two different threads. (I call cudaSetDevice(0) before launching threads so in fact the CUDA context should be the same).
After scratching my head a bit, I wonder if perhaps cudaHostAlloc() is not quite thread safe to the extent I need?
Though, since I am (trying to) provide thread safety with mutexes, unless CUDA is explicitly using the thread id in some
way, it should be OK.
All I can find out for sure is that pinned/portable memory can be accessed by multiple devices which are controlled
from different host threads. If it is safe to allocate/free in different host threads, and am not sure.
Does anyone know?