RTX Pathtracer crash

This what I saw

2024-01-24 06:10:56 [64,539ms] [Warning] [gpu.foundation.plugin] Waiting for compilation of ray tracing shaders by GPU driver: 30 seconds so far
2024-01-24 06:11:26 [94,539ms] [Warning] [gpu.foundation.plugin] Waiting for compilation of ray tracing shaders by GPU driver: 60 seconds so far
2024-01-24 06:11:56 [124,539ms] [Warning] [gpu.foundation.plugin] Waiting for compilation of ray tracing shaders by GPU driver: 90 seconds so far
2024-01-24 06:12:26 [154,540ms] [Warning] [gpu.foundation.plugin] Waiting for compilation of ray tracing shaders by GPU driver: 120 seconds so far
2024-01-24 06:12:56 [184,540ms] [Warning] [gpu.foundation.plugin] Waiting for compilation of ray tracing shader/hoopla/data/dcc_check/scenes/TrawlerBoat_000.usda
Rendering /hoopla/data/dcc_check/scenes/TrawlerBoat_000.usda
2024-01-24 06:14:42 [289,942ms] [Error] [rtx.optixdenoising.plugin] [Optix] [DENOISER] Unable to load denoiser weights
2024-01-24 06:14:42 [289,942ms] [Error] [rtx.optixdenoising.plugin] optixDenoiserCreate(m_optixCtx, OPTIX_DENOISER_MODEL_KIND_AOV, &denoiserOptions, &denoiser) failed. Optix Error: OPTIX_ERROR_INTERNAL_ERROR.
Internal error
s by GPU driver: 150 seconds so far
2024-01-24 06:13:26 [214,540ms] [Warning] [gpu.foundation.plugin] Waiting for compilation of ray tracing shaders by GPU driver: 180 seconds so far
2024-01-24 06:13:56 [244,541ms] [Warning] [gpu.foundation.plugin] Waiting for compilation of ray tracing shaders by GPU driver: 210 seconds so far
2024-01-24 06:14:26 [274,541ms] [Warning] [gpu.foundation.plugin] Waiting for compilation of ray tracing shaders by GPU driver: 240 seconds so far
2024-01-24 06:14:34 [282,188ms] [Warning] [gpu.foundation.plugin] Ray tracing shader compilation finished after 247 seconds
2024-01-24 06:14:42 [289,886ms] [Warning] [carb.flatcache.plugin] UsdRelationship /Render/RenderProduct_Replicator.orderedVars has multiple targets, which is not supported

2024-01-24 06:14:42 [290,367ms] [Error] [carb.crashreporter-breakpad.plugin] crash detected
2024-01-24 06:14:42 [290,367ms] [Warning] [carb.crashreporter-breakpad.plugin] A crash has occurred.  If a debugger should be attached, please set the '/crashreporter/debuggerAttachTimeoutMs' setting to a timeou
t in milliseconds.  This can be used to allow the crash reporter to wait for up to that long for a debugger to attach before processing or sending the crash report.
2024-01-24 06:14:42 [290,367ms] [Warning] [carb.crashreporter-breakpad.plugin] Uploading minidump: file:'/root/.local/share/ov/data/Kit/Code/2022.3/7b0adb7c-038a-447a-4ef9618e-0112b38c.dmp' svr:'https://services
.nvidia.com/submit'
2024-01-24 06:14:42 [290,367ms] [Warning] [carb.crashreporter-breakpad.plugin] Crash metadata for upload:
2024-01-24 06:14:42 [290,367ms] [Warning] [carb.crashreporter-breakpad.plugin]   CarbSdkVersion = '129.3+129.tc95.1c400a99'
2024-01-24 06:14:42 [290,367ms] [Warning] [carb.crashreporter-breakpad.plugin]   DumpId = '7b0adb7c-038a-447a-4ef9618e-0112b38c'
2024-01-24 06:14:42 [290,367ms] [Warning] [carb.crashreporter-breakpad.plugin]   ProductName = 'OmniverseKit'
2024-01-24 06:14:42 [290,367ms] [Warning] [carb.crashreporter-breakpad.plugin]   RetryCount = '0'
2024-01-24 06:14:42 [290,368ms] [Warning] [carb.crashreporter-breakpad.plugin]   StartupTime = '1706076592'
2024-01-24 06:14:42 [290,368ms] [Warning] [carb.crashreporter-breakpad.plugin]   UptimeSeconds = '290'
2024-01-24 06:14:42 [290,368ms] [Warning] [carb.crashreporter-breakpad.plugin]   Version = '104.1+release.332.6846c5b6.tc'
2024-01-24 06:14:42 [290,368ms] [Warning] [carb.crashreporter-breakpad.plugin]   appName = 'Code'
2024-01-24 06:14:42 [290,368ms] [Warning] [carb.crashreporter-breakpad.plugin]   appState = 'started'
2024-01-24 06:14:42 [290,368ms] [Warning] [carb.crashreporter-breakpad.plugin]   appVersion = '2022.3.1-rc.1'
2024-01-24 06:14:42 [290,368ms] [Warning] [carb.crashreporter-breakpad.plugin]   autoloadExts = ''
2024-01-24 06:14:42 [290,368ms] [Warning] [carb.crashreporter-breakpad.plugin]   buildBranch = 'release'
2024-01-24 06:14:42 [290,368ms] [Warning] [carb.crashreporter-breakpad.plugin]   buildCi = 'tc'
2024-01-24 06:14:42 [290,368ms] [Warning] [carb.crashreporter-breakpad.plugin]   buildHash = '6846c5b6'
2024-01-24 06:14:42 [290,368ms] [Warning] [carb.crashreporter-breakpad.plugin]   buildId = '12951306'
2024-01-24 06:14:42 [290,368ms] [Warning] [carb.crashreporter-breakpad.plugin]   buildMajor = '104'
2024-01-24 06:14:42 [290,368ms] [Warning] [carb.crashreporter-breakpad.plugin]   buildMinor = '1'
2024-01-24 06:14:42 [290,368ms] [Warning] [carb.crashreporter-breakpad.plugin]   buildMr = '0'
2024-01-24 06:14:42 [290,368ms] [Warning] [carb.crashreporter-breakpad.plugin]   buildNumber = '332'
2024-01-24 06:14:42 [290,368ms] [Warning] [carb.crashreporter-breakpad.plugin]   buildPatch = '0'
2024-01-24 06:14:42 [290,368ms] [Warning] [carb.crashreporter-breakpad.plugin]   buildVersion = '104.1.0'
2024-01-24 06:14:42 [290,368ms] [Warning] [carb.crashreporter-breakpad.plugin]   cpuId = 'Intel64 Family 6 Model 151 Stepping 2'
2024-01-24 06:14:42 [290,368ms] [Warning] [carb.crashreporter-breakpad.plugin]   cpuName = '12th Gen Intel(R) Core(TM) i9-12900KF'
2024-01-24 06:14:42 [290,368ms] [Warning] [carb.crashreporter-breakpad.plugin]   cpuVendor = 'GenuineIntel'
2024-01-24 06:14:42 [290,368ms] [Warning] [carb.crashreporter-breakpad.plugin]   environmentName = 'default'
2024-01-24 06:14:42 [290,368ms] [Warning] [carb.crashreporter-breakpad.plugin]   experience = 'Code'
2024-01-24 06:14:42 [290,368ms] [Warning] [carb.crashreporter-breakpad.plugin]   externalBuild = '1'
2024-01-24 06:14:42 [290,368ms] [Warning] [carb.crashreporter-breakpad.plugin]   gpuDriver_0 = '535.154'
2024-01-24 06:14:42 [290,368ms] [Warning] [carb.crashreporter-breakpad.plugin]   gpuVRAM_0 = '51784974336'
2024-01-24 06:14:42 [290,368ms] [Warning] [carb.crashreporter-breakpad.plugin]   gpu_0 = 'NVIDIA RTX A6000'
2024-01-24 06:14:42 [290,368ms] [Warning] [carb.crashreporter-breakpad.plugin]   kitRendererDriverVersion = '535.154'
2024-01-24 06:14:42 [290,368ms] [Warning] [carb.crashreporter-breakpad.plugin]   lastCommand = 'CreateNodeCommand(graph=?,node_path=/Render/PostProcess/SDGPipeline/WriterSyncGate,node_type=omni.graph.action.SyncGate,create_usd=True)'

running a Docker image of Replicator nvcr.io/nvidia/omniverse-replicator:1.6.3

it is running a procedural build of a shader attached

shader.zip (994 Bytes)

Which works fine when opacity is not mapped

Rendering is as follows

import omni.replicator.core as rep
import omni.usd
from pathlib import Path
import sys
import carb

#rep.settings.set_stage_up_axis("z")
DATA_DIR = Path("/hoopla/data")

SCENES = Path("/hoopla/data/dcc_check/scenes")

ALL_SCENES = SCENES.glob(f"{sys.argv[-1]}")


for cur_scene in list(ALL_SCENES):
    print(cur_scene.as_posix())

    CAMERA_NAME = "/World/renderCamera"

    OUTPUT_DIR = f"/hoopla/data/dcc_check/eyecandy/{cur_scene.name}"

    omni.usd.get_context().open_stage(cur_scene.as_posix())

    num_frames = 600
    a = 1
    with rep.new_layer():
        camera_paths = [f"{CAMERA_NAME}"]
        render_products = [rep.create.render_product(f"{CAMERA_NAME}",(1920*a,1080*a))]
        with rep.trigger.on_frame(num_frames=num_frames):
            print(f"Rendering {cur_scene}")
        writer = rep.WriterRegistry.get("BasicWriter")
        writer.initialize(output_dir=OUTPUT_DIR, rgb=True)

        writer.attach(render_products)
        draft = False
        if draft:
            rep.settings.set_render_rtx_realtime("RTXAA")
        else:
            rep.settings.set_render_pathtraced(128)
        rep.orchestrator.run()

@samuel.hodge i am just another user, but i am curious to know whether the problem persists with Code 2023.1.1. secondly, judging by the console log snippet alone, does turning off denoiser in the render setting change anything on your end?

lastly, it may help the mods/devs troubleshoot your problem if they can look at the most recent .log file in its entirety. you should be able to locate it here - ~/.nvidia-omniverse/logs/Kit/Code/2022.3

It was in a Docker Container for Replicator, it was not using Code I will see if I can get that log out of the docker image

Maybe this helps

cat /opt/nvidia/omniverse/code-launcher/kit/PACKAGE-INFO.yaml

Package: kit-sdk
Version: 104.1+release.332.6846c5b6.tc.linux-x86_64.release
Commit: error
Time: Wed Dec  7 06:28:27 2022
CI Build ID: 12951394
Platform: linux-x86_64
CI Build Number: 104.1+release.332.6846c5b6.tc
root@c34b2efa6865:/opt/nvidia/omniverse/code-la

The log was uploaded, and they have the UUID for the log anyway.

If someone wants to give advice about an upgrade path I am happy to follow that advice.

error_log.txt.zip (11.7 KB)
here is the log

Let me ask the RTX dev group

I asked the RTX Dev and it seems that you need to use a new container from here:

I will see if this solves the issue

With the new container I needed to get to Ubuntu Driver 545 and when I did this the script that I was using nolonger produced any output, but it did not crash, I do not have time to dedicate to this upgrade at the moment.

If there a simple way of rendering this scene with replicator from a given camera to see if it causes the crash?

Here is a good video overview of Replicator and it shows the code and how to “render out”

It seems that this container is defective.

the instructions say

Accelerating Start up time
You will notice that the first time you launch the container, it has a lengthy start up time of about 2 minutes due to compiling shaders regardless of how much data you are generating. To minimize the start up time, and make sure you can deploy the container in the machine over and over again without the start up time you can follow the next steps:

nvidia-docker run --entrypoint /bin/bash -it nvcr.io/nvidian/ov-synthetic-data-generation:xx

Within the container run:

./cache_script.sh

After that script has run, on a different terminal commit the container (for more info on docker commit, click here)

docker commit [OPTIONS] CONTAINER ov-synthetic-data-generation-startup:v1

CONTAINER here refers to the container on the other terminal. You can find it using docker container ls

After running this, you can close the container and run with the container you committed. Shaders will recompile if you launch this container on a new machine or if the driver is slightly different. Even a patch will make the difference.

Refer to our Omniverse Replicator User Guide for more information.

for one there is a typo nvidian != nvidia

The other thing is the ./cache_script.sh never completes I left it running for near to ten hours on a machine with 48 threads EPYC ROME 7402p and 2 RTX 3090 GPUs with 128Gb of RAM, it was not complete

In the past this script has taken less than 10 minutes to run.

Can someone test this for me?

@Richard3D thanks for this video.

This video is running in a graphical viewer with the USD scene already open.

I am using a Docker container with a headless session running a python script that opens the USD scene and then called omniverse run once the layer and writer are setup, the script is in the original post.

I am not sure why it was working with replicator 1.6.3 but doesnt work with synthetic-data 0.16-beta?

So this was working with Replicator 1.6.3 and which version does it break ? Can you try the same script as a regular GUI session, not headless and see if it works ? We need isolate a lot of variables. I would try 1) with / without the docker 2) with / without the gui and any other variables you can think of to pin point the issue.

Let me talk to the experts on this and get back to you.

Yes this was “working” with Replicator 1.6.3 apart from the crash with the opacity texture being used.

I am not an omniverse beta tester, I do not have time to test continously.

No I understand, but with those two simple tests we would isolate a huge amount of variables. The more data we get from you, the quicker we can resolve your issue. So what version of Replicator are you using now ? Can you stick with that for now ?

I am using Docker image nvcr.io/nvidia/ov-synthetic-data-generation:0.0.16-beta but as mentioned earlier the ./cache_shaders.sh warm up script does not work.

this is the error I get from this image

2024-04-19 00:41:11,768 ERROR (render-files): __main__: Render error: 2024-04-19 00:40:53 [8,404ms] [Fatal] [carb.crashreporter-breakpad.plugin] 000: libc.so.6!__sigaction+0x50
2024-04-19 00:41:11,768 ERROR (render-files): __main__: Render error: 2024-04-19 00:40:55 [10,413ms] [Fatal] [carb.crashreporter-breakpad.plugin] 001: libomniclient.so!omniClientFreeContent+0x1110
2024-04-19 00:41:11,768 ERROR (render-files): __main__: Render error: 2024-04-19 00:40:57 [12,519ms] [Fatal] [carb.crashreporter-breakpad.plugin] 002: libomniclient.so!omniClientFreeContent+0x1fce
2024-04-19 00:41:11,768 ERROR (render-files): __main__: Render error: 2024-04-19 00:40:58 [13,613ms] [Fatal] [carb.crashreporter-breakpad.plugin] 003: libomniclient.so!omniClientCombineWithBaseUrl2+0x8a54
2024-04-19 00:41:11,768 ERROR (render-files): __main__: Render error: 2024-04-19 00:41:00 [14,842ms] [Fatal] [carb.crashreporter-breakpad.plugin] 004: libomniclient.so!omniClientCombineWithBaseUrl2+0x1a1d
2024-04-19 00:41:11,768 ERROR (render-files): __main__: Render error: 2024-04-19 00:41:01 [16,017ms] [Fatal] [carb.crashreporter-breakpad.plugin] 005: libomniclient.so!omniClientCombineWithBaseUrl2+0x36cf
2024-04-19 00:41:11,768 ERROR (render-files): __main__: Render error: 2024-04-19 00:41:02 [17,114ms] [Fatal] [carb.crashreporter-breakpad.plugin] 006: libomniclient.so!std::pair<std::__detail::_Node_iterator<unsigned long, true, false>, bool> std::_Hashtable<unsigned long, unsigned long, std::allocator<unsigned long>, std::__detail::_Identity, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, true, true> >::_M_emplace<unsigned long&>(std::integral_constant<bool, true>, unsigned long&)+0x6b1a
2024-04-19 00:41:11,768 ERROR (render-files): __main__: Render error: 2024-04-19 00:41:03 [18,129ms] [Fatal] [carb.crashreporter-breakpad.plugin] 007: libomniclient.so!std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (omniclient::provider_file::FileThread::*)(), omniclient::provider_file::FileThread*> > >::_M_run()+0xc5
2024-04-19 00:41:11,768 ERROR (render-files): __main__: Render error: 2024-04-19 00:41:04 [19,113ms] [Fatal] [carb.crashreporter-breakpad.plugin] 008: libomniclient.so!std::string __gnu_cxx::__to_xstring<std::string, char>(int (*)(char*, unsigned long, char const*, __va_list_tag*), unsigned long, char const*, ...)+0xc4f
2024-04-19 00:41:11,768 ERROR (render-files): __main__: Render error: 2024-04-19 00:41:05 [20,019ms] [Fatal] [carb.crashreporter-breakpad.plugin] 009: libomniclient.so!std::function<void ()>::function<omniclient::provider_file::FileThread::Command<std::function<void (OmniClientResult, std::vector<ListEntry, std::allocator<ListEntry> > const&)> >::fireCallback<OmniClientResult, std::vector<ListEntry, std::allocator<ListEntry> > >(OmniClientResult const&, std::vector<ListEntry, std::allocator<ListEntry> > const&)::{lambda()#1}, void, void>(omniclient::provider_file::FileThread::Command<std::function<void (OmniClientResult, std::vector<ListEntry, std::allocator<ListEntry> > const&)> >::fireCallback<OmniClientResult, std::vector<ListEntry, std::allocator<ListEntry> > >(OmniClientResult const&, std::vector<ListEntry, std::allocator<ListEntry> > const&)::{lambda()#1})+0x34de
2024-04-19 00:41:11,768 ERROR (render-files): __main__: Render error: 2024-04-19 00:41:06 [20,921ms] [Fatal] [carb.crashreporter-breakpad.plugin] 010: libomniclient.so!omniclient::Core::singleTick(std::chrono::time_point<std::chrono::_V2::steady_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >&)+0x332
2024-04-19 00:41:11,768 ERROR (render-files): __main__: Render error: 2024-04-19 00:41:07 [21,830ms] [Fatal] [carb.crashreporter-breakpad.plugin] 011: libomniclient.so!omniclient::Core::TickThread::tickThread()+0xce
2024-04-19 00:41:11,768 ERROR (render-files): __main__: Render error: 2024-04-19 00:41:08 [22,757ms] [Fatal] [carb.crashreporter-breakpad.plugin] 012: libstdc++.so.6!std::error_code::default_error_condition() const+0x33
2024-04-19 00:41:11,768 ERROR (render-files): __main__: Render error: 2024-04-19 00:41:09 [23,725ms] [Fatal] [carb.crashreporter-breakpad.plugin] 013: libc.so.6!pthread_condattr_setpshared+0x513
2024-04-19 00:41:11,768 ERROR (render-files): __main__: Render error: 2024-04-19 00:41:10 [24,723ms] [Fatal] [carb.crashreporter-breakpad.plugin] 014: libc.so.6!__xmknodat+0x230