After compiling my OpenCL program, I am able to store the program binary, which in the CUDA case is PTX code representing the kernel.
I’m wondering, after looking at the PTX results, is what I am seeing before optimization or after?
There are a couple of things that give me concern:
I see function calls that I would expect to be inlined.
I see small parameter arrays that I store in constant address space, still accessed by index, even though I would think the array element value could be substituted into the referencing location.
Is the optimization behavior specified anywhere?