[bugreport] Writing to CUDA SurfaceObjects produces no result

devsh · March 7, 2020, 7:07pm

Hi,

My program has the exact same set-up as this (using “bindless” surface objects introduced with Kepler)

https://stackoverflow.com/questions/58303950/writing-to-cuda-surface-from-optix-kernel

I do understand that if I dispatch 1280x720 optix raygen, it wont actually dispatch a 1280x720 kernel but some group of persistent workgroups that will iterate through ray-packets, and hence more than one physical dispatch thread may end up writing to the Surface.

However even if my memory is 100% not coherent I’d still expect the writes to go through, is there some sort of a memory barrier I should issue like in OpenGL ?

devsh · March 7, 2020, 7:51pm

I’ve tried hand-massaging the PTX before it goes into optix trying out different cache op modes for the sust op, such as .wb, .wt and .cg.

I also even put a stream synchronise and an OpenGL memory barrier


		cuda::CCUDAHandler::cuda.pcuStreamSynchronize(stream);
		video::COpenGLExtensionHandler::extGlMemoryBarrier(GL_ALL_BARRIER_BITS);

Still no avail

devsh · March 7, 2020, 9:29pm

It seems that OptiX is excising the sust instruction from the PTX during its compilation

original CUDA C

original PTX (%r3 and %r4 contain the launch_index.xy)

BB0_3:
    ld.const.u64 	%rd8, [params];
    shl.b32     %r8, %r3, 4;
    mov.u32     %r9, 1065353216;
    mov.u32     %r10, 1132396544;
    sust.b.2d.cg.v4.b32.trap     [%rd8, {%r8, %r4}], {%r10, %r10, %r10, %r9};
    ret;

disassembly (no sust instruction)

0x0000029dab64ba90  [296] shl.b32 	%r28, %r17, 4; 
0x0000029dab64ba90               IMAD.SHL.U32 R10, R16, 0x10, RZ  
0x0000029dab64baa0  [307] st.param.b8	[param0+3], %rs16; 
0x0000029dab64baa0               PRMT R4, R13, 0x654, R0  
0x0000029dab64bab0  [328] call.uni  
0x0000029dab64bab0               MOV R20, 0x0  
0x0000029dab64bac0               MOV R21, 0x0  
0x0000029dab64bad0               CALL.ABS.NOINC 0x0

dhart · March 9, 2020, 4:55pm

Hi devsh,

Which OptiX version and driver did you try this with? I haven’t actually tried writing to a surface in an OptiX program, it is possible there’s a bug. Do you have a complete & minimal reproducer you could share with us?

FWIW, if you launch a 1280x720 optix raygen, we choose the block size, but other than it does equate to a kernel who’s dimension is 1280 * 720 threads. You can verify this in Nsight Compute, for example. The main thing you need to be aware of and careful with is that OptiX programs are not CUDA, even though we’re trying to make it as close as possible. OptiX shaders cannot use shared memory, synchronizations, barriers, or other SM-thread-specific programming constructs in device code.

–
David.

devsh · March 9, 2020, 7:04pm

I’m using OptiX 7.0.0, latest one.

I could try and put some minimal and complete reproducer, but my stuff is always NVRTC JIT compiled, and using OpenGL interop where OpenGL owns the textures and buffers. But i dont think it would be as nice as the code from here to debug for you

I’d much rather you patch the Hello World SDK sample with this guy’s changes (far better repro sample)

Actually you use persistent threads (common raytracing trick and HPC GPGPU), I can see that you’re launching 288 blocks of 64 invocations on my RTX 2070, this actually turns out to be 8 invocations per “CUDA core” (I have 2304 of those, 36 SMs and Turing can do 64 SIMD in 2 warps of 32)

So I’d presume there’s some cooperative CUDA going on or a shared global atomic counter and a circular buffer work-list ;)

Yeah, noted, already knew that… but image storage from a kernel is not any of the above, right?

dhart · March 9, 2020, 10:43pm

I asked around and discovered that we have an open bug report on surface writes, it’s indeed not working correctly in OptiX. I’ll follow up here when it’s fixed. Thanks for the report.

–
David.

shocker.0x15 · April 22, 2020, 7:59pm

Hi, I met the same issue during investigation of a related issue :surf2Dread in OptiX kernel.

I created a minimal reproducer for this issue.

I created two equivalent kernels, the one is written as OptiX’s raygen, the other is a normal CUDA kernel.
The reproducer creates an array (512x256, float4) and fills all the pixels by red.
The both kernels read the array via surface object and add blue gradient over the red image. If the kernels work as expected, the resulted image should be red to purple gradient.

The reproducer is set to use the OptiX kernel by default. The result is completely red image in my environment. On the other hand, CUDA kernel (can be enabled by commenting out USE_OPTIX_KERNEL in test_shared.h) produces the gradient.
For validating purpose, I put a macro to switch the surface object to a plain buffer by commenting out USE_SURFACE_OBJECT. In this case both kernels produce the gradient.

Thanks,

Environment:
Windows 10, 1909
NVIDIA Driver: 445.87
CUDA 10.1.243
OptiX 7.0, installed at the default location.
Visual Studio Community 2019, 16.5.4
RTX 2070

droettger · April 23, 2020, 9:12am

Thanks. I downloaded the reproducer project and filed another bug for investigation.

droettger · August 5, 2020, 7:08am

@devsh Just in case you hadn’t seen the message in the thread linked in comment 7, the R450 display drivers supporting OptiX 7.1.0 fixed the surface access for that case.
Please try if you’re getting the expected results with drivers from that branch. Thanks.

Topic		Replies	Views
surf2Dread in OptiX kernel OptiX	8	2070	July 13, 2020
OpenGL interop: Reading from and writing to surface CUDA Programming and Performance	8	3536	December 16, 2015
Reading and Writing OpenGL Textures with Cuda CUDA Programming and Performance	24	25599	June 26, 2023
surf2Dwrite<uchar4> from OptiX raygen causes device fault on Ada (SM 8.9), while surf2Dwrite<uint32_t> with identical bytes works OptiX gaming	1	61	April 22, 2026
Failing surface objects on secondary device CUDA Programming and Performance	5	920	November 5, 2014
Optix 7 Dynamic textures OptiX	2	1096	March 7, 2020
CUDA 3.1beta: writes to texture? CUDA Programming and Performance	10	3520	May 27, 2010
Writing Performance of Surface in CUDA CUDA Programming and Performance	4	4583	January 27, 2016
Odd Misaligned Error with Surface Object CUDA Programming and Performance	1	1087	October 29, 2014
Memory checker reports access errors with surface write CUDA Setup and Installation	0	676	December 18, 2014

[bugreport] Writing to CUDA SurfaceObjects produces no result

Related topics