I’m using Jetpack 3.1 on a TX2 to develop a GPU-accelerated application. I noticed that I cannot perform the GCC atomic function __atomic_sub_fetch on memory that was allocated with cudaHostAlloc. The following test case always causes the application to hang:
#include <cuda.h>
int main()
{
int* x;
cudaHostAlloc(&x, 64, cudaHostAllocMapped);
__atomic_sub_fetch(&x[0], 1, __ATOMIC_RELAXED);
//never gets here
return 0;
}
After executing this code, the following messages are visible in the kernel log (dmesg)
Hi, i had similar error when i used cudaHostAlloc with custom allocator and shared_ptr. Could you give a link to the document that describes this problem?
Yes, we have managed to work around this limitation. I can’t share too many details, because the fix was incorporated in proprietary libraries. We use array/image objects with reference counting, implemented via atomics. The gist of it is that our array/image objects are using a different memory allocation for their data VS their headers. The data allocation could be CUDA device memory or managed memory, but the header allocation will always be done via a host heap (new/malloc).