Hi I have some very strange memory issues on the device, what I’m doing:
cudaMalloc a struct.
cudaMemcpy host to device to initialize the struct
passing the pointer returned by cudaMalloc to the struct kernel
printf the contents of the struct inside the kernel
returning to the host
A number of structs are copied and passed, and a couple of arrays are as well. However, I’m doing nothing in this kernel except printing the arguments that got passed into the kernel. I also don’t have any errors reported by cudaMalloc or cudaMemcpy.
For small debug size cuda-memcheck reports no errors and the kernel runs successfully. however when I run my code on real problem sized data I see that some of the threads print one of the struct members with a bad value. Just one member, other struct mmebers seem to be fine. And also the kernel doesn’t complete I get the ever so helpful error message : “Error: Process didn’t terminate successfully”. Often I can only run the code once and have to reboot the system! I know this seems as though I’m stomping memory, however I’ve double checked that the sizes I pass to cudamemcpy and cudaMalloc are OK.
what is going on here? Is it possible that I am using too much memory but cudaMalloc still succeeds? how would I determine the problem?