Strange Validation Error

Hi,

in my OptiX7.5-based path tracer app (using Detlef’s renderer architecture from the Advanced Samples) I encountered on a DEBUG run a strange error:
Validation mode found current CUDA context does not match the CUDA context associated with the supplied OptixDeviceContext
Never seen before. Many other scenes (using animation geometry update/MDL/textures/bones/lights/…) render without problems with this current version of my app.
It happens when creating the first GAS of the scene
(prim_count=12 triangles, num_vertices=24, no refit, using index buffer)

from the debug output (debug level 4):
[4][KNOBS]: All knobs on default.
do optixDeviceContextCreate
created context state.context=27146b66240h
CUDA stream successfully created state.stream=27102366ed0h
OptiX7 cache successfully switched OFF
[…]
optixAccelComputeMemoryUsage(state.context, … ) using compaction succeeds
gas_buffer_sizes.tempSizeInBytes=300h
gas_buffer_sizes.outputSizeInBytes=1000h
alloc d_temp_buffer
alloc non-compacted output
allow_compacted=1
state.context=27146b66240h <<< SAME cuda context !
do GlobalSync using cudaDeviceSynchronize() OK, also no last error
do optixAccelBuild
[2][ERROR]: Validation mode found current CUDA context does not match the CUDA context associated with the supplied OptixDeviceContext

On RELEASE run there seems to be a stack overflow (VS2019 cannot even show all the stack frame; see screenshot)

What could the underlying issue be here?
Yet I cannot provide a reproducer, cause its only happening in the app with an animated scene when initially adding geometry.

Thank you.

screenshot:

My System:
OptiX 7.5.0 SDK
CUDA 11.7
GTX 1050 2GB
Win10PRO 64bit (version 21H1; build 19043.1237)
8GB RAM
device driver: 516.59
VS2019 v16.11.17
MDL SDK 2020.1.2
Windows SDK 10.0.19041.0

internal id: (938U12) for me to remember this module version

When you say you encountered this on a debug run, does that mean the application works fine without the OptiX validation enabled?
In that case, there might be an issue inside the validation mode only which would need to be checked internally.

If that only happens with a specific animated scene when initially adding geometry, can you provide some more information on the AS build inputs and geometry flags, if you’re using motion blur with how many animation keys, the render graph hierarchy, and everything which would potentially help to reproduce this?

The debug stack frame looks like there was some infinite recursion.
You’re running on a very small graphics board and system. Could you check the available memory while running in release and debug mode? I want to make sure you’re not hitting a hardware limit when running in debug and validation mode.

Please compare the results of cudaMemGetInfo or cuMemGetInfo directly before the issue happens in release and debug mode and also look at the host RAM usage inside the TaskManager.
https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__MEMORY.html#group__CUDART__MEMORY_1g376b97f5ab20321ca46f7cfa9511b978

Thank you for your answer.

Sorry, the screenshot is from the RELEASE run, not from DEBUG;
The application crashes on a RELEASE run; there validation is not enabled.

I will go through your suggestions next; and I will provide more data then.

The only object in the scene is a memory-generated geometry, animated by Nvidia Blast (in a separate module); all other animations are switched off.
As far as I know, Nvidia Blast only uses DirectX 11 and PhysX 3 (the PhysX3* .DLLs are shipped within the Blast package), no CUDA directly.
So I thought it cannot affect the CUDA context; but maybe I missed something there?
I have seen, that PhysX uses CUDA :
From PhysX info :
[…]PhysX uses both the CPU and GPU, but generally the most computationally intensive operations are done on the GPU.[…]
And so could there somehow be a CUDA context switch?
I’m lost what to do about that…
Should I try to switch the context back to the original one I got on intialization through cuCtxGetCurrent? But where? On ANY cuda-related function call?

The interaction with the geometry is working when rendering memory-generated geometry by Nvidia Flex animation output.

My App Design:
BaseInitializations (in this order):

  1. init NVIDIA WaveWorks (if scene uses it; its not CUDA-related)
  2. init OptiX => ( CUDA context initialized )
    initial initialization of object data, in this case 0 objects, cause only one memory-generated animated object, but its not present yet; so only its material is initialized
  3. init NVIDIA Flow (if such animation is present in the scene; its not CUDA-related)
  4. init Nvidia Flex (if such animation is present in the scene; its not CUDA-related)
  5. init Nvidia Blast (if such animation is present in the scene; uses CUDA through PhysX 3 !); here another CUDA context created???
  6. init other components (if scene uses them; none of them is CUDA-related)

NOTE: in a pure rasterizer output all these components work perfectly together without problems. After running the Blast Solver the new geometry is sent as new subset to Flex, which then updates the internal geometry subset (outside the ray module) after the Flex Solver has finished. That geometry then is sent to the ray module in case the raytracer is active. And except the Blast-generated objects all other also work perfectly on raytraced rendering.

MainLoop (in this order):

  1. on frame move:

    • update lights / cam
    • update movie instructions which are driving the animation (only if movie playback active)
    • update global animation info:
      - wind vector / etc
      - UpdateBlastSolver (if such Nvidia Blast animation is present in the scene)
      => memory-generated geometry is sent to ray module (pure buffers);
      its material-information-data reference-relation is updated for each new generated fracture object, from underlying base object, of which its mesh geometry will be virtually “destructed” into many new object subsets by Blast;
      (technically material_subset_id = base_subset_id; is performed)
      a “geometry instance” is setup for each piece; however, on first frame simply only one subset;
      no optix-related calls here
      - UpdateFlexSolver (if such animation is present in the scene)
      => memory-generated geometry is sent to ray module; pure buffers
      - frameMoveWaveWorks (if present)
      - other DirectX11-related frame move calls (if their data is present)
  2. on frame render:

    • render some rasterizer output (if present in scene)
      NOTE: Raytracer & Rasterizer Depth is respected in both renderers
      => so rasterizer depth is passed to raytracer and in a second rasterizer pass then also the raytracer depth is applied on rasterizer output; this way both contents can be merged; pure CUDA-kernels may be used here, but they succeed on all other tested scenes so far
    • render raytraced objects:
      • update all raytraced lights (if any)
      • update all raytraced mesh subsets (if dirty or if added)
        so here memory-generated “geometry instances” are processed;
        now the geometry device buffers are created by a derivative ofApplication::createSphere; normals/texcoods/tangents are merged into one buffer for space reduction
        => as input simply vertex and index buffers are used.
        UpdatePrevTransforms and related functions are called when morphing or temporal denoiser is active (not in this case)
      • update ray materials (if dirty)
      • updateOptiXdata
        • updateGeometryInstances()
          for each active and dirty “geometry instance”: createGAS() is run
          => here the CUDA context validation error occurs
        • (re-)build IAS
        • (re-)build pipeline
        • update SBT
      • launch raytracing render (OptiX; using “optixLaunch”)
      • present to D3D (DirectX11) output

checking renderer inits:

  • cudaFree(0) init ok
  • I removed another cudaFree(0) call, after optixInit(); => but still same error
  • I now use cuCtxGetCurrent as shown here : cuCtxGetCurrent
    instead of : CUcontext cuCtx = 0; // where zero means “current context”
  • optixInit(); (in optix_stubs.h) succeeds
    and I changed the order of intializing the OptiX function table before optixDeviceContextCreate like shown here: call optixDeviceContextCreate
    => the change did not have any effect related to the reported issue
  • no motion blur used
  • no morphing used
  • no refit
  • all other animations are switched off; only this one object in the scene so far
    its geometry can be found in this file: test_b_obj.obj (908 Bytes)
    b_objcct_scr_40p

(directly buffer contents were written formatted as .obj data to debug output and saved as .obj file)
Concerning the validation error It does not make a difference, whether its encoded as CW or CCW (exchanging 2nd and 3rd index of each triangle)

  • the geometry is memory-generated from Nvidia Blast animation;
    but on first frame that (destructive) animation not even started, the geometry
    output is about the same as its input
    => PathTracer works fine when using such memory-generated geometry
    animated and re-created through Nvidia Flex animation.
  • in this case a scene graph in my app is always designed this way:
    root: IAS
    => for each object (subset) I use one IAS entry, which then directs to a GAS entry
    The validation already exits during building a GAS, so building the IAS is not even
    reached in that case.
  • a pipeline also was not build yet, because first all GAS as created, then IAS and then the pipeline

===================================================================
// GAS creation:

	OptixBuildInput triangle_input = {};
	triangle_input.type = OPTIX_BUILD_INPUT_TYPE_TRIANGLES;

	triangle_input.triangleArray.indexFormat = OPTIX_INDICES_FORMAT_UNSIGNED_INT3;
	triangle_input.triangleArray.indexStrideInBytes = 3 * sizeof(unsigned int);
	triangle_input.triangleArray.numIndexTriplets = prim_count; // triangles
	triangle_input.triangleArray.indexBuffer = d_indices;

	triangle_input.triangleArray.flags = &triangle_input_flags[0];
	triangle_input.triangleArray.numSbtRecords = 1
	triangle_input.triangleArray.sbtIndexOffsetBuffer = NULL;
	triangle_input.triangleArray.sbtIndexOffsetSizeInBytes = 0;
	triangle_input.triangleArray.sbtIndexOffsetStrideInBytes = 0;

	triangle_input.triangleArray.vertexFormat = OPTIX_VERTEX_FORMAT_FLOAT3;
	triangle_input.triangleArray.vertexStrideInBytes = 3 * sizeof(float);
	triangle_input.triangleArray.numVertices = num_vertices;


	OptixAccelBuildOptions accel_options = {};
	if (allow_compacted)
	{
		accel_options.buildFlags = OPTIX_BUILD_FLAG_ALLOW_COMPACTION;
	}
	else
	{
		accel_options.buildFlags = OPTIX_BUILD_FLAG_NONE;
	}

    accel_options.operation = OPTIX_BUILD_OPERATION_BUILD;

debug output:

OPTIX_BUILD_FLAG_ALLOW_COMPACTION
OPTIX_BUILD_OPERATION_BUILD
OptixAccelBuildOptions.buildFlags=2h
OptixAccelBuildOptions.operation=2161h

cudaMemGetInfo => free_gpu_mem=135a3667h (~309.6mb) total_gpu_mem=7ffc0000h (~2047.8mb)

do optixAccelComputeMemoryUsage
cudaMemGetInfo => free_gpu_mem=135a3667h (~309.6mb) total_gpu_mem=7ffc0000h (~2047.8mb)

gas_buffer_sizes.tempSizeInBytes=300h
gas_buffer_sizes.outputSizeInBytes=1000h
alloc d_temp_buffer
alloc non-compacted output
allow_compacted=1 >>> NOTE: even if compaction is off, same error
state.context=19159d374c0h
do GlobalSync
do optixAccelBuild
[2][ERROR]: Validation mode found current CUDA context does not match the CUDA context associated with the supplied OptixDeviceContext
cudaMemGetInfo => free_gpu_mem=135a3667h (~309.6mb) total_gpu_mem=7ffc0000h (~2047.8mb)

debug output if “allow compaction” is OFF:
OPTIX_BUILD_FLAG_NONE
OPTIX_BUILD_OPERATION_BUILD
OptixAccelBuildOptions.buildFlags=0h
OptixAccelBuildOptions.operation=2161h
cudaMemGetInfo => free_gpu_mem=145f3667h (~326.0mb) total_gpu_mem=7ffc0000h (~2047.8mb)
do optixAccelComputeMemoryUsage
cudaMemGetInfo => free_gpu_mem=145f3667h (~326.0mb) total_gpu_mem=7ffc0000h (~2047.8mb)
same validation error


Task Manager Memory Usage:

and here the TaskManager on crashing (RELEASE, no validation mode enabled):


here some successful rendering scenes (one with even lower free GPU memory when at same stage where the validation error occurs in reported case):

I think its not a memory limit issue.

Currently I simply don’t really know where to search for the error…
If it would be memory-related I would assume, that then an out-of-memory message occurs; Cause in some other cases, where the geometry was too complex I consistantly get such error out-of-memory messages.


here some Task-Manager screenshots from the 4 successful running
scenes shown above (all RELASE mode):

please tell me if you need further information details


[changes in this reply-sub-post are applied in internal id 939U14]

Yes, if there are other SDKs involved and there is some CUDA context mismatch error happening, it makes a lot of sense trying to set the CUDA context you’ve created for OptiX usage explicitly.

Means I would try putting cuCtxPushCurrent and cuCtxPopCurrent pairs around the invocations of the other SDKs to make sure they do not change your expected CUDA context when calling into OptiX.

Or you could set the current CUDA context with cuCtxSetCurrent after each of these other SDK invocations.

Do that for each of your additional SDKs individually to see if one of them breaks it.

(I use cuCtxSetCurrent in all examples which support multi-GPU and prefer the CUDA driver API because that allows explicit control about the CUDA context. https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/rtigo10/src/Device.cpp#L1552
With the CUDA runtime you would switch devices with cudaSetDevice() function which might not help in your case.)

Since you’re using the CUDA runtime API, another question would be if that CUDA context which gets automatically created on the first CUDA call is actually the one you created inside the OptiX code with that dummy cudaFree(0) call or if that is reusing an already existing CUDA context created by the initialization of any of your other SDKs in use.

In any case, the involved libraries would need to correctly push and pop the currently active CUDA context to make sure the caller is using its expected one. If that is not the case, that would be an error inside the called library.

Maybe use Nsight Systems to see what CUDA API calls are done by your application including all additional libraries.

Other than that, it’s always a good idea to try different display drivers in case this is an error inside OptiX. There are three newer driver releases available for your system configuration.

Given your description of the modules involved, there is no way to reproduce this without an executable in failing state.

1 Like

I now use “cuCtxSetCurrent” when entering “render raytraced objects:” on “frame render” in “MainLoop” (see my “App Design” above)

This did the trick! :) Detlef, thank you very much!

Video: raytraced “destructed” Blast-object + rasterizer Flex-Fluid animation; NOTE: all the Blast-generated objects are forwarded as collision contraints to Flex and then are added to OptiX as GASes in a rebuilt IAS; denoiser not used:

Flex and Blast themselves obivously don’t use CUDA, but PhysX seems to use it. Obviously I missed the CUDA-sub-dependency there, when implementing Nvidia Blast in my modules.

The “BaseInitializations” in my App Design (see previous post) are created in the order shown; so the OptiX CUDA context is the first created one; Nviida Waveworks does not use CUDA in my module. When I built all the DLL modules using the SDKs, I always chose DirectX11 (or DirectX12) from the different implementation alternatives; But in case of Blast I obviously oversaw the CUDA-sub-dependency of PhysX.

However, anything now seems to work fine; Even with temporal denoising active, no memory issues:

Once again thank you very much !!!

The cleaner solution would have been the cuCtxPushCurrent and cuCtxPopCurrent calls around the other library calls.
That would have isolated where the CUDA context actually changed inadvertently, which in turn would provide the information for the respective module’s development team to investigate why that happened.

The functions “InitBlast” and “UpdateBlastSolver” in my NvidiaBlastSDK-based module were not present in the original SDK. I added them there.
Those functions call Application->initApp() and Application->Animate() which are also not present in the original SDK; I use a derivative of the “Application”-class defined as shown in “samples\SampleBase\core\Application.h” in the Blast SDK. That class is "sample code". So I also added initApp() and Animate() there.
So when I develped the functions, I did not know, that PhysX itself can use the GPU through CUDA. I double-checked, whether there is somewhere a cudaFree(0) call in the Blast SDK (and in my derivatives), but there isn’t.
The Blast SDK itself has some PhysX3.dll files as dependencies, which I simply use as they are (from Blast 1.1 from 2017); there the CUDA context seems to be switched. (There is a newer version of PhysX, so maybe that already would address this, but the version I used from the SDK is the one recommended for Blast)
So I think its up to me (the app developer) to ensure the correct context switch.

The controller module in my app (which executes UpdateBlastSolver) is completely separated from the raytracing module; That controller module simply does not use any CUDA-related code.
From what I read on PhysX Faq, […]PhysX uses both the CPU and GPU, but generally the most computationally intensive operations are done on the GPU.[…]
So that sounds to me that PhysX decides dependent on a current system configuration, what to do.
On Blast :[…] “Blast, at the low-level and toolkit layer are physics and graphics agnostic”[…]
So maybe that would include a push/pop as you suggested, Detlef.

Through my current solution I think its much safer related to any (also future) SDKs implementations. Its faster to do one general set current context call instead of push and pop around other SDKs separately.
However, I’m sure, that somewhere in Blast/PhysX the CUDA context obviously is switched. The other SDKs do not use CUDA at all in my implementations.

The latest Blast version 1.1.7 (from 2020) was made for VS 2017.
But unfortunately I was not able to build this new version today from a fresh start with VS2019.
My Blast-based module I still use was build in 2017 and I upgraded it later to VS2019.

In PhysX 3.4 bkup zip (of 2017) readme.md I found:
"(2) The APEX SDK distribution contains pre-built binaries supporting GPU acceleration. Re-building the APEX SDK removes support for GPU acceleration. The solutions can be found under APEX_1.4\compiler. "
This file seems to be the one, which contains it:
PhysX-3.4-master\APEX_1.4\bin\vc14win64-PhysX_3.4\PhysX3Gpu_x64.dll (26,727,136bytes June 2nd 2017)
I did not rebuild it, cause I want to keep the GPU acceleration.

The version of that file used as a dependency in my NvBlast-based module is:
PhysX3GpuCHECKED_x64.dll” (26,414,592bytes Sep 26th 2017)
its 100% identical to the one found in Blast v1.1 project: “Blast_vc14\bin\vc14win64-cmake\PhysX3GpuCHECKED_x64.dll”
which was copied from the SDK shipped DLLs:
“\NVIDIA\packman-repo\PhysX-vc14win64\3.4.21652946\bin\vc14win64-cmake-staticcrt\PhysX3GpuCHECKED_x64.dll” (26,414,592bytes June 2nd 2017)

Althgouh there was a PhysX 3.4, APEX 1.4 patch release @23933511; see: commit in 2018:
it seems, that it did not affect any GPU-related stuff.
So basically the version I use is very similar to: Release 3.4.1 Release · NVIDIAGameWorks/PhysX-3.4 · GitHub

The current PhysX 3.4 project says:
[…]Welcome to NVIDIA’s PhysX and APEX SDK source code repository. This depot includes the PhysX SDK, the APEX SDK, and the Kapla Demo application.
NOTE: The APEX SDK is not needed to build either the PhysX SDK nor the demo and has been deprecated. It is provided for continued support of existing applications only. We recommend the following libraries as replacements:
For APEX Clothing: NvCloth - GitHub - NVIDIAGameWorks/NvCloth
For APEX Destruction: Blast - GitHub - NVIDIAGameWorks/Blast: A modular destruction SDK designed for performance and flexibility, replacing APEX destruction
For APEX Particles: Flex - GitHub - NVIDIAGameWorks/FleX
For APEX Turbulence: Flow - GitHub - NVIDIAGameWorks/Flow: Flow is a sparse grid-based fluid simulation library for real-time applications.
[…]

However, but the replacement for “APEX Destruction” is “Blast” - and that still requires the shipped DLLs as dependencies…