On three different machines I found my Cuda code significantly slower for Cuda 10.2 compared to Cuda 9.2. In fact it is a factor of 6 times slowed for a Nvidia 2080 Ti card machine. I use single graphics card on all three. Has there been some significant change that might explain this? It is a simple function doing a convolution and uses cudaMalloc, etc code. I can upload it if that would help.