Do I need to lock calls like “malloc” or “new” inside the kernel?
The programming guide doesn’t mention anything about locking, and given that there really is no simple “locking” construct in CUDA to begin with, I would assume that device-side memory allocation is threadsafe. (I wonder if they use atomics to achieve this, or a segmented heap…) It even says that you can pass the pointers between threads, and free() from a different thread as long as you don’t double-free a pointer.
Thanks for your reply. I also found that the the allocation fails if I just try to allocate a moderate size (starting to fail at ~10MB for just a single thread single block kernel, in GTX480). Anyone knows if this total size is configurable?
The size of the device memory heap is configurable through cudaDeviceSetLimit.