About 6 hours into my run I’m getting a problem with cudaMalloc segfaulting (under linux) or causing very odd behavior (under windows). It happens on the largest memory allocation that I have in my program.
I’m fairly sure I must be doing something wrong somewhere, but I can’t work out what I’m doing wrong. I’m just trying to rule out possibilities.
Am I right in thinking this can’t be caused by the GPU? I would expect it to return an error code rather than segfaulting if the malloc fails. No previous calls return error codes either.
If it’s not the GPU it’s on the host side. I’ve noticed a few memory leaks reported by valgrind in libcudart.so (one on cudaMalloc… though not the one that segfaults, and one on cudaGetDeviceCount), however upgrading my memory from 3GB to 4GB didn’t change the behavior of the program at all. I can only think that something on the host is overflowing and overwriting something which doesn’t like being overwritten.
Given the 6 hour in nature of the problem it’s painfully hard to work with. Any thoughts?