Converting opengl compute shader to cuda

Thanks for the answers so far.
After some digging, it seems that even checking the sass of an opengl shader is not possible so I have no way to see how floating-point image atomics are translated to sass instructions:

Yes. I do not care about the returned value so a surface reduction operation is what I’m looking for.

According to this document, there are two sass instructions called SUATOM and SURED that perform surface atomic and surface reduction operations. It is unclear whether these instructions support floating-point operands though since I can’t look at the compiled sass of my compute shader.

OpenGL and Vulkan both support a wide range of floating-point atomic / reduction operations on surface memory but these functionalities do not seem to be available with cuda.
Where should I ask for a feature request?

I know that I can use regular atomicAdds on half2 values in linear memory as a workaround but my use case really benefits from the 3D data locality offered by textures and surfaces.

The whole reason I would like to switch my compute shaders to cuda is that there is a large overhead for each glDispatchCompute call. I have a bunch of compute shaders and cuda kernels that must run back to back as quickly as possible. I’m also paying a price for the cuda/opengl interop since the compute shaders and cuda kernels are running in an interleaved fashion. Currently, each gpu kernel runs for a duration of a few microseconds each and the gpu is kept busy only about 30-50% of the time.
Thus, my goal is to create a cuda graph to minimize kernel launch overheads as much as possible but the compute shaders need to be converted to cuda kernels.