Hello,
I have a problem which seems to be related to scalar write and cache coherency.
I have an OptiX kernel which takes the following pipeline launch parameters:
struct PipelineLaunchParameters {
StaticPipelineLaunchParameters* s;
PerFramePipelineLaunchParameters* f;
};
Here, StaticPipelineLaunchParameters
is:
struct StaticPipelineLaunchParameters {
...
LightDistribution lightInstDist;
...
};
My program launches pure CUDA kernels before the OptiX kernel to update some members in lightInstDist
.
LightDistribution
is a struct describing a discrete cumulative distribution function:
class LightDistribution {
float* m_weights;
float* m_CDF;
float m_integral;
uint32_t m_numValues;
};
The first CUDA kernel updates arrays m_weights
by each thread and m_numValues by a representative thread, then the CUB function performs scan to fill m_CDF
based on m_weights
. Finally another CUDA kernel updates m_integral
by a single thread.
However the OptiX kernel failed to read the updated value m_integral
and m_numValues
in Debug build (The OptiX kernel uses OPTIX_COMPILE_OPTIMIZATION_LEVEL_0
and OPTIX_COMPILE_DEBUG_LEVEL_FULL
).
I’m not sure if I’m doing something wrong particularly since handling OptiX’s pipeline launch parameters is opaque to me.
Are there some caveats to do things like this? Is there a way to correctly handle this (e.g. using volatile keyword somewhere?)?
Thanks,
Windows 11 22H2 (22621.1483)
RTX 4080 16GB
Driver 531.29
CUDA 12.1 (and -std=c++20
for kernels)
OptiX 7.6.0
Visual Studio Community 2022, 17.5.3