Converting opengl compute shader to cuda

Hello,
I would like to convert an opengl compute shader to a cuda kernel.
The compute shader uses the GL_NV_shader_atomic_fp16_vector extension to perform a bunch atomic adds on a 3D texture like so:

imageAtomicAdd(img, coords, f16vec4(values));

As far as I’m aware, cuda does not expose this functionality but ptx does with the sured instruction.
However, this is not available for the float16 type if my understanding is correct.
Is there any way to have access to this very useful feature in cuda?
Is it possible to use inline sass instructions to do so?
What would the syntax be?
Thanks for your help.

NVIDIA does not make any tools for programming at the SASS (machine code) level publicly available. The lowest supported level of programming available to CUDA programmers is inline PTX code, where PTX is a portable virtual ISA that does double duty as a compiler intermediate format. PTX is compiled into SASS by ptxas, so the control available to programmers via PTX is reduced when compared to inlining of classical assembly language code that mostly translates one-to-one into machine instructions.

Even if “sured” supported fp16, would it offer you what you need, given there is no mention of atomic in the sured PTX ISA entry?

Thanks for the answers so far.
After some digging, it seems that even checking the sass of an opengl shader is not possible so I have no way to see how floating-point image atomics are translated to sass instructions:

Yes. I do not care about the returned value so a surface reduction operation is what I’m looking for.

According to this document, there are two sass instructions called SUATOM and SURED that perform surface atomic and surface reduction operations. It is unclear whether these instructions support floating-point operands though since I can’t look at the compiled sass of my compute shader.

OpenGL and Vulkan both support a wide range of floating-point atomic / reduction operations on surface memory but these functionalities do not seem to be available with cuda.
Where should I ask for a feature request?

I know that I can use regular atomicAdds on half2 values in linear memory as a workaround but my use case really benefits from the 3D data locality offered by textures and surfaces.

The whole reason I would like to switch my compute shaders to cuda is that there is a large overhead for each glDispatchCompute call. I have a bunch of compute shaders and cuda kernels that must run back to back as quickly as possible. I’m also paying a price for the cuda/opengl interop since the compute shaders and cuda kernels are running in an interleaved fashion. Currently, each gpu kernel runs for a duration of a few microseconds each and the gpu is kept busy only about 30-50% of the time.
Thus, my goal is to create a cuda graph to minimize kernel launch overheads as much as possible but the compute shaders need to be converted to cuda kernels.

Feature requests can be filed using the procedure outined in the “How to report a bug” post pinned at the top of this forum.