This bug is erratic so an isolated case is not possible, but clearly there is a big problem with unified memory in the driver.
If I malloc a block using regular c malloc, it runs fins for a while then all of a sudden, writing to the middle of that block errors out that it is an invalid address. When I view memory, sure enough there is a big block right in the middle of the block that is invalid. Not garbage, but completely invalid address so even the debugger can’t view it. This is right in the middle of a regular host memory block via malloc.
I am using lots of page locked memory but shrank that down to no avail. All cudaHostMalloc are fine, as are cudaMallocs, and run fine. No kernel errors, no additional mallocs.
Any ideas anyone?