I have a single CUDA device and multiple host threads, in-which I each call cuCtxCreate, cuModuleLoadData and cuModuleGetGlobal - the latter to get a device pointer to a u32 in my kernel file.
My threads each get different hContext and hModule handles but the same DevicePtr address from these calls but I’m unable to move a u32 from host to device and back to host using my threads. No runtime errors; the memcopy doesn’t fail - but my second thread always copies 0 from the device.
Single threads don’t seem to be a problem.
Is it a problem that I’m using multiple contexts?
Edit: dynamic device allocations with cuCtxPopCurrent() and cuCtxPushCurrent() seem to be the direction to move in… The forum archive is an awesome resource!