Hi David,

Thank you for your informative answer. To answer one of your question, the doubling of number of vertices is due to higher-resolution sampling of the (more of less) same shapes.

I found out that the slow-down is due to the use of acos function, kind of, which I use to compute the rays’ angles.

So my rays’ angles have this form:

ray ith’s angle = theta + i*alpha, and i’m given cos(theta), sin(theta), and alpha.

So I lazily and naively use acos to compute theta from cos(theta) like this

theta = acos(cos(theta)) if sin(theta) >= 0 and -acos(cos(theta)) otherwise.

After that I compute ray ith’s angle from the above formula and then cos(angle), sin(angle) to give to optix trace.

This works fine for the low-resolution case but slow down (10x) in the high-resolution case when number of primitives doubles. Then I realize that I can compute my ray’s angle without acos, using a rotation of v = (cos(theta), sin(theta)) by angle i*alpha. Then the high-resolution case is not slow anymore.

I dont suspect anything special about the implementations of acos, sin, and cos, (is there?), so the fact that my old method (using acos) works fine for the low-resolution shapes but somehow slow down with higher-resolution says that something else is wrong with my input, ie cos(theta), sin(theta).

Also, in the process of profiling, I time the other parts too, like initialization of optix contex, building gas, creating modules, creating program groups, linking pipelines, and setting sbt. I notice that initialization of the context, building gas, and creating modules are most time consuming. Interestingly, their timings change (get faster), as I run my program many times. What’s the explanation for this? Is creating modules where/when optix kernels are compiled by JIT compilation?