large Buffer memory allocation?

19101 * 4831 * (sizeof(uchar4) + sizeof(float4)) = 1845538620 = 1760 MB
Normally a GTX 760 has 2 GB VRAM.

Means you’ve basically filled up your board with these two buffers alone.
Then you would need memory for the geometry attributes, their acceleration structures, stack space, and then there is all of the other graphics things to run the OS. It’s not surprising that this is running into out of memory issues.

I don’t know the algorithm so I can’t answer the question how to split your work into smaller pieces.

Also what is your expected runtime per launch?
Mind that there are Windows timeouts (TDR) with a board running in windows display driver mode (WDDM) when running longer than 2 seconds per launch.

The 2 GB GTX 760 appears under-powered if you’re targeting grid sizes around 100 MPixel if you’re unable to split the work into digestible chunks. If that’s not possible and depending on the runtime per launch it even might make sense to use a dedicated Tesla board for the required compute tasks. Tesla boards run in a different driver mode (TCC) which is unaffected by that WDDM timeout.