So I’ve stared at this for a while now.
I’ve got a pretty lengthy piece of code so I’ll try and demonstrate what I am doing. I’ve got a piece of code that queues up lots of kernels in streams and gets it to to run through them.
cuDoubleComplex **submesh, **submesh_t
for(int stream = 0; stream < total_streams; ++stream){
int last = min; //Defined previously
for(int wp = min; wp<=max; ++wp){
kernel1 <<< 1, 64, 0, streams[stream] >>>
(submesh[stream],submesh_t[stream]);
}
transfer_kernel <<< dimBlock , dimGrid, 0, streams[stream] >>>
(submesh_t[stream], wtransfer, 2, subgrid_size);
}
//More post-processing after this...
However its really odd in that all the kernels inside the inner loop seemingly run just fine but the outer loop kernel always crashes. I’ve reduced the kernel to just being empty, so the cuda runtime is instantiating an empty kernel with the block/grid parameters I put in, but it always fails with a really cryptic error message and cuda-memcheck yields:
========= Fatal UVM CPU fault due to invalid operation
========= during read access to address 0x1357c92000
=========
========= ERROR SUMMARY: 1 error
Everything that is being passed into these functions is allocated with unified memory, and if it isn’t its usually just some static value like an int or double, so just a standard parameter.
I’ve never seen this error before and googling around yields nothing. I am running nowhere near the maximum memory of the card. Any suggestions would be appreciated.