Illegal memory access

I am using unified memory with cuDNN. My code is written so that
it can run either with or without unified memory. It is also written so that it runs various size models.

My code runs fine without unified memory on all size models. It also runs fine with unified memory on all but the largest models.

With unified memory on the largest models, I get an illegal memory access. By wrapping all calls with cudaStreamSynchronize before and after, I have narrowed down the location of the illegal memory access to a call to cudnnConvolutionForward with CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM. Only the kernel passed to this cuDNN call is in unified memory. I leave about 250MB of GPU RAM free. The kernels are all much smaller than this (by two orders of magnitude). My program has many calls to cudnnConvolutionForward (tens of thousands). The call that triggers the illegal memory access varies from run to run. My code is carefully written to be deterministic. When this error does not occur, my code always executes the same execution path with the same numbers and produces the same result.

I have looked at the arguments to cudnnConvolutionForward right before the illegal memory access. They are all reasonable and point to correctly allocated memory of the correct size.

What could be causing this? I have found that if I leave less unused GPU RAM, the issue occurs more frequently and if I leave more unused GPU RAM, the issue occurs less frequently. But it all cases, the kernels are far smaller than the amount of unused GPU RAM. No other GPU process is running on the machine.

This is on a Titan V with
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Nov__3_21:07:56_CDT_2017
Cuda compilation tools, release 9.1, V9.1.85

cudnn-9.0-linux-x64-v7.6.2.24

running Debian 9.13

I realize this is old, but an upgrade would be disruptive unless it would be certain to fix this issue.