Strange Validation Error

m001 · September 23, 2022, 10:04pm

The only object in the scene is a memory-generated geometry, animated by Nvidia Blast (in a separate module); all other animations are switched off.
As far as I know, Nvidia Blast only uses DirectX 11 and PhysX 3 (the PhysX3* .DLLs are shipped within the Blast package), no CUDA directly.
So I thought it cannot affect the CUDA context; but maybe I missed something there?
I have seen, that PhysX uses CUDA :
From PhysX info :
[…]PhysX uses both the CPU and GPU, but generally the most computationally intensive operations are done on the GPU.[…]
And so could there somehow be a CUDA context switch?
I’m lost what to do about that…
Should I try to switch the context back to the original one I got on intialization through cuCtxGetCurrent? But where? On ANY cuda-related function call?

The interaction with the geometry is working when rendering memory-generated geometry by Nvidia Flex animation output.

My App Design:
BaseInitializations (in this order):

init NVIDIA WaveWorks (if scene uses it; its not CUDA-related)
init OptiX => ( CUDA context initialized )
initial initialization of object data, in this case 0 objects, cause only one memory-generated animated object, but its not present yet; so only its material is initialized
init NVIDIA Flow (if such animation is present in the scene; its not CUDA-related)
init Nvidia Flex (if such animation is present in the scene; its not CUDA-related)
init Nvidia Blast (if such animation is present in the scene; uses CUDA through PhysX 3 !); here another CUDA context created???
init other components (if scene uses them; none of them is CUDA-related)

NOTE: in a pure rasterizer output all these components work perfectly together without problems. After running the Blast Solver the new geometry is sent as new subset to Flex, which then updates the internal geometry subset (outside the ray module) after the Flex Solver has finished. That geometry then is sent to the ray module in case the raytracer is active. And except the Blast-generated objects all other also work perfectly on raytraced rendering.

MainLoop (in this order):

on frame move:
- update lights / cam
- update movie instructions which are driving the animation (only if movie playback active)
- update global animation info:
  - wind vector / etc
  - UpdateBlastSolver (if such Nvidia Blast animation is present in the scene)
  => memory-generated geometry is sent to ray module (pure buffers);
  its material-information-data reference-relation is updated for each new generated fracture object, from underlying base object, of which its mesh geometry will be virtually “destructed” into many new object subsets by Blast;
  (technically material_subset_id = base_subset_id; is performed)
  a “geometry instance” is setup for each piece; however, on first frame simply only one subset;
  no optix-related calls here
  - UpdateFlexSolver (if such animation is present in the scene)
  => memory-generated geometry is sent to ray module; pure buffers
  - frameMoveWaveWorks (if present)
  - other DirectX11-related frame move calls (if their data is present)
on frame render:
- render some rasterizer output (if present in scene)
  NOTE: Raytracer & Rasterizer Depth is respected in both renderers
  => so rasterizer depth is passed to raytracer and in a second rasterizer pass then also the raytracer depth is applied on rasterizer output; this way both contents can be merged; pure CUDA-kernels may be used here, but they succeed on all other tested scenes so far
- render raytraced objects:
  - update all raytraced lights (if any)
  - update all raytraced mesh subsets (if dirty or if added)
    so here memory-generated “geometry instances” are processed;
    now the geometry device buffers are created by a derivative ofApplication::createSphere; normals/texcoods/tangents are merged into one buffer for space reduction
    => as input simply vertex and index buffers are used.
    UpdatePrevTransforms and related functions are called when morphing or temporal denoiser is active (not in this case)
  - update ray materials (if dirty)
  - updateOptiXdata
    - updateGeometryInstances()
      for each active and dirty “geometry instance”: createGAS() is run
      => here the CUDA context validation error occurs
    - (re-)build IAS
    - (re-)build pipeline
    - update SBT
  - launch raytracing render (OptiX; using “optixLaunch”)
  - present to D3D (DirectX11) output

checking renderer inits:

cudaFree(0) init ok
I removed another cudaFree(0) call, after optixInit(); => but still same error
I now use cuCtxGetCurrent as shown here : cuCtxGetCurrent
instead of : CUcontext cuCtx = 0; // where zero means “current context”
optixInit(); (in optix_stubs.h) succeeds
and I changed the order of intializing the OptiX function table before optixDeviceContextCreate like shown here: call optixDeviceContextCreate
=> the change did not have any effect related to the reported issue
no motion blur used
no morphing used
no refit
all other animations are switched off; only this one object in the scene so far
its geometry can be found in this file: test_b_obj.obj (908 Bytes)

(directly buffer contents were written formatted as .obj data to debug output and saved as .obj file)
Concerning the validation error It does not make a difference, whether its encoded as CW or CCW (exchanging 2nd and 3rd index of each triangle)

the geometry is memory-generated from Nvidia Blast animation;
but on first frame that (destructive) animation not even started, the geometry
output is about the same as its input
=> PathTracer works fine when using such memory-generated geometry
animated and re-created through Nvidia Flex animation.
in this case a scene graph in my app is always designed this way:
root: IAS
=> for each object (subset) I use one IAS entry, which then directs to a GAS entry
The validation already exits during building a GAS, so building the IAS is not even
reached in that case.
a pipeline also was not build yet, because first all GAS as created, then IAS and then the pipeline

===================================================================
// GAS creation:

	OptixBuildInput triangle_input = {};
	triangle_input.type = OPTIX_BUILD_INPUT_TYPE_TRIANGLES;

	triangle_input.triangleArray.indexFormat = OPTIX_INDICES_FORMAT_UNSIGNED_INT3;
	triangle_input.triangleArray.indexStrideInBytes = 3 * sizeof(unsigned int);
	triangle_input.triangleArray.numIndexTriplets = prim_count; // triangles
	triangle_input.triangleArray.indexBuffer = d_indices;

	triangle_input.triangleArray.flags = &triangle_input_flags[0];
	triangle_input.triangleArray.numSbtRecords = 1
	triangle_input.triangleArray.sbtIndexOffsetBuffer = NULL;
	triangle_input.triangleArray.sbtIndexOffsetSizeInBytes = 0;
	triangle_input.triangleArray.sbtIndexOffsetStrideInBytes = 0;

	triangle_input.triangleArray.vertexFormat = OPTIX_VERTEX_FORMAT_FLOAT3;
	triangle_input.triangleArray.vertexStrideInBytes = 3 * sizeof(float);
	triangle_input.triangleArray.numVertices = num_vertices;


	OptixAccelBuildOptions accel_options = {};
	if (allow_compacted)
	{
		accel_options.buildFlags = OPTIX_BUILD_FLAG_ALLOW_COMPACTION;
	}
	else
	{
		accel_options.buildFlags = OPTIX_BUILD_FLAG_NONE;
	}

    accel_options.operation = OPTIX_BUILD_OPERATION_BUILD;

debug output:

OPTIX_BUILD_FLAG_ALLOW_COMPACTION
OPTIX_BUILD_OPERATION_BUILD
OptixAccelBuildOptions.buildFlags=2h
OptixAccelBuildOptions.operation=2161h

cudaMemGetInfo => free_gpu_mem=135a3667h (~309.6mb) total_gpu_mem=7ffc0000h (~2047.8mb)

do optixAccelComputeMemoryUsage
cudaMemGetInfo => free_gpu_mem=135a3667h (~309.6mb) total_gpu_mem=7ffc0000h (~2047.8mb)

gas_buffer_sizes.tempSizeInBytes=300h
gas_buffer_sizes.outputSizeInBytes=1000h
alloc d_temp_buffer
alloc non-compacted output
allow_compacted=1 >>> NOTE: even if compaction is off, same error
state.context=19159d374c0h
do GlobalSync
do optixAccelBuild
[2][ERROR]: Validation mode found current CUDA context does not match the CUDA context associated with the supplied OptixDeviceContext
cudaMemGetInfo => free_gpu_mem=135a3667h (~309.6mb) total_gpu_mem=7ffc0000h (~2047.8mb)

debug output if “allow compaction” is OFF:
OPTIX_BUILD_FLAG_NONE
OPTIX_BUILD_OPERATION_BUILD
OptixAccelBuildOptions.buildFlags=0h
OptixAccelBuildOptions.operation=2161h
cudaMemGetInfo => free_gpu_mem=145f3667h (~326.0mb) total_gpu_mem=7ffc0000h (~2047.8mb)
do optixAccelComputeMemoryUsage
cudaMemGetInfo => free_gpu_mem=145f3667h (~326.0mb) total_gpu_mem=7ffc0000h (~2047.8mb)
same validation error

Task Manager Memory Usage:

TaskManager_Memory1124×700 146 KB

and here the TaskManager on crashing (RELEASE, no validation mode enabled):

here some successful rendering scenes (one with even lower free GPU memory when at same stage where the validation error occurs in reported case):

SuccessfulScenes1796×2976 428 KB

I think its not a memory limit issue.

Currently I simply don’t really know where to search for the error…
If it would be memory-related I would assume, that then an out-of-memory message occurs; Cause in some other cases, where the geometry was too complex I consistantly get such error out-of-memory messages.

here some Task-Manager screenshots from the 4 successful running
scenes shown above (all RELASE mode):

MEM_on_successful_render_scenes1104×2976 277 KB

please tell me if you need further information details

[changes in this reply-sub-post are applied in internal id 939U14]

Strange Validation Error

Task Manager Memory Usage: TaskManager_Memory1124×700 146 KB

SuccessfulScenes1796×2976 428 KB

MEM_on_successful_render_scenes1104×2976 277 KB

Task Manager Memory Usage:

TaskManager_Memory1124×700 146 KB