Not sure if anyone experienced this but with a higher image resolution (larger fb size) using pixel_index number as the sequence number slows curand_init() to a crawl. I've run into similar problem in the following thread and suggested solution in this comment, suggesting to vary seed instead of sequence. Quoting Cuda toolkit: "Sequences generated with different seeds usually do not have statistically correlated values".
I noticed in your color function that you don’t handle the issue of thread divergence, do you have any suggestions for minimizing that issue?
No, for simplicity I did not do anything special for divergence. I don’t have any ready suggestions either, sorry. Perhaps others who are reading can help?