OpenCL Issues in Linux (Nvidia 440)

Hello,

I recently changed my graphics board from a GTX 1060 to a GTX 1650S and OpenCL code that was working without any issues seems to no longer work in the same fashion.

I’m using Ubuntu 18.04, with the Nvidia-drivers-440 installed and the Cuda Toolkit version 10.2.

The previous version is using Ubuntu 16.04, with the Nvidia 384.130 drivers installed.

Both use the OpenCL 1.2 version.

The issue seems to be that an clEnqueueNDRangeKernel works with the GTX 1060 without even getting the board up to 100% usage and on the GTX 1650S it hangs on the clFinish call right below it.

My theory is that a high number of kernels is being called and it is somehow throttling the board, but no idea why that would happen.

Any ideas on what could be happening, how to debug or solve it?

Update: Changing the Global and Local work sizes (to a large value and to 1 respectively) seem to yield results, but a lot slower than on the previous board. Anything above 1 for the local work size when calling clEnqueueNDRangeKernel leads the clFinish call to hang.