Converting opengl compute shader to cuda

utilisateur2281 · March 29, 2024, 1:41pm

Hello,
I would like to convert an opengl compute shader to a cuda kernel.
The compute shader uses the GL_NV_shader_atomic_fp16_vector extension to perform a bunch atomic adds on a 3D texture like so:

imageAtomicAdd(img, coords, f16vec4(values));

As far as I’m aware, cuda does not expose this functionality but ptx does with the sured instruction.
However, this is not available for the float16 type if my understanding is correct.
Is there any way to have access to this very useful feature in cuda?
Is it possible to use inline sass instructions to do so?
What would the syntax be?
Thanks for your help.

njuffa · March 29, 2024, 7:26pm

NVIDIA does not make any tools for programming at the SASS (machine code) level publicly available. The lowest supported level of programming available to CUDA programmers is inline PTX code, where PTX is a portable virtual ISA that does double duty as a compiler intermediate format. PTX is compiled into SASS by ptxas, so the control available to programmers via PTX is reduced when compared to inlining of classical assembly language code that mostly translates one-to-one into machine instructions.

rs277 · March 29, 2024, 9:34pm

Even if “sured” supported fp16, would it offer you what you need, given there is no mention of atomic in the sured PTX ISA entry?

utilisateur2281 · March 30, 2024, 11:17am

Thanks for the answers so far.
After some digging, it seems that even checking the sass of an opengl shader is not possible so I have no way to see how floating-point image atomics are translated to sass instructions:

Yes. I do not care about the returned value so a surface reduction operation is what I’m looking for.

According to this document, there are two sass instructions called SUATOM and SURED that perform surface atomic and surface reduction operations. It is unclear whether these instructions support floating-point operands though since I can’t look at the compiled sass of my compute shader.

OpenGL and Vulkan both support a wide range of floating-point atomic / reduction operations on surface memory but these functionalities do not seem to be available with cuda.
Where should I ask for a feature request?

I know that I can use regular atomicAdds on half2 values in linear memory as a workaround but my use case really benefits from the 3D data locality offered by textures and surfaces.

The whole reason I would like to switch my compute shaders to cuda is that there is a large overhead for each glDispatchCompute call. I have a bunch of compute shaders and cuda kernels that must run back to back as quickly as possible. I’m also paying a price for the cuda/opengl interop since the compute shaders and cuda kernels are running in an interleaved fashion. Currently, each gpu kernel runs for a duration of a few microseconds each and the gpu is kept busy only about 30-50% of the time.
Thus, my goal is to create a cuda graph to minimize kernel launch overheads as much as possible but the compute shaders need to be converted to cuda kernels.

rs277 · March 30, 2024, 5:49pm

Feature requests can be filed using the procedure outined in the “How to report a bug” post pinned at the top of this forum.

Topic		Replies	Views
Passing source pointer from OpenGL texture to cuda kernel? CUDA Programming and Performance	2	3314	October 19, 2017
CUDA 1.0 FAQ (OBSOLETE) Frequently asked questions about CUDA Announcements	2	75860	February 9, 2009
CUDA 2.1 FAQ Please read before posting CUDA Programming and Performance	10	210995	January 18, 2014
render to Texture help needed I need help with rendering with cuda to a OPenGL usable texture CUDA Programming and Performance	9	2365	September 28, 2010
Low Level CUDA C Programming Education CUDA Programming and Performance cuda	2	968	December 13, 2021
OpenGL interop: Reading from and writing to surface CUDA Programming and Performance	8	3344	December 16, 2015
What about half-float? CUDA Programming and Performance	18	29388	October 26, 2017
CUDA 3.1beta: writes to texture? CUDA Programming and Performance	10	3358	May 27, 2010
questions on cuda+ directx/opengl CUDA Programming and Performance	3	6210	March 18, 2008
CUDA intrinsics? CUDA Programming and Performance	7	3572	November 16, 2017

Converting opengl compute shader to cuda

Related topics