Docker data generation: Replicator 1.6.3, Pathtraced WITHOUT motion blur? How?

Optix denoiser error - #4 by michal.stanik maybe this helps trigger some notes

This from @dhart might also help

Also valid

@Turowicz can feel my pain with Optix in Docker

Adding -v /usr/share/nvidia/nvoptix.bin:/usr/share/nvidia/nvoptix.bin is a workaround for docker.

1 Like

Ick!

Not doable for Kubernetes I don’t think

It’s a security breach for a container to access files on the host node, but I will ask the admin for an exception

Sam

Easily doable on kubernetes - depends on your gpu-operator settings. As for security - make sure to mount it as readOnly: true.

1 Like

It’s a small world after all

That you don’t need

1 Like

I am glad to be wrong. Next time we are in the same place at the same time I owe you a beverage of your choice

Looks like some study is required

Looks like the only way to get the nvoptix.bin from the host node is a security grey zone

see:
https://makocchi.medium.com/kubernetes-cve-2017-1002101-en-5a30bf701a3e

The support for the /usr/share/nvidia/libnvoptix.bin in the k8s runtime environment doesn’t seem to be there for how our k8s cluster is setup.

running Canonical’s microk8s gpu addon set to host works perfectly everytime and this is my development environment.

but my production environment needs me to somehow mount the .bin file on the path that IssacKit 2023.1.1 expects it to be.

@Turowicz sorry to be a pain in the rump, but can you send me a way of doing this using hostPath? We get the following error

the snippet we have is

            volumeMounts:
              - name: aiml-engineering-data
                mountPath: /datamount
              - name: optix-nvidia 
                mountPath: "/usr/share/nvidia" 
                subPath: nvoptix.bin 
                readOnly: true 
        volumes:
        - name: aiml-engineering-data
          persistentVolumeClaim:
            claimName: aiml-engineering-data
        - name: optix-nvidia 
          hostPath: 
            path: "/usr/share/nvidia/nvoptix.bin" 
            type: File 

the error reads

Warning Failed 6s (x4 over 31s) kubelet Error: failed to prepare subPath for volumeMount "optix-nvidia" of container "app"

What a n00b

here is what works

            volumeMounts:
              - name: aiml-engineering-data
                mountPath: /datamount
              - name: optix-nvidia
                mountPath: "/usr/share/nvidia/nvoptix.bin"
                readOnly: true
        volumes:
        - name: aiml-engineering-data
          persistentVolumeClaim:
            claimName: aiml-engineering-data
        - name: optix-nvidia
          hostPath:
            path: "/usr/share/nvidia/nvoptix.bin"
            type: File

Not quite there

[6535188.938073] NVRM: GPU at PCI:0000:01:00: GPU-
[6535188.938078] NVRM: GPU Board Serial Number: 
[6535188.938079] NVRM: Xid (PCI:0000:01:00): 109, pid=2116254, name=kit, Ch 00000021, errorString CTX SWITCH TIMEOUT, Info 0x108010
[6535193.687030] NVRM: GPU at PCI:0000:c1:00: GPU-
[6535193.687037] NVRM: GPU Board Serial Number: 
[6535193.687038] NVRM: Xid (PCI:0000:c1:00): 109, pid=2877153, name=kit, Ch 00000021, errorString CTX SWITCH TIMEOUT, Info 0x108010

@pcallender can we report this upstream to the driver team?

Getting a lot of

2024-05-23 05:43:25 [560,338ms] [Error] [omni.syntheticdata.plugin] OgnSdStageSemanticInstanceMapping missing filteredLabelMap semLabelMap in the history.

Hard crash

2024-05-23 05:47:35 [299,500ms] [Error] [omni.syntheticdata.plugin] OgnSdStageSemanticInstanceMapping missing filteredLabelMap semLabelMap in the history.
2024-05-23 05:47:38 [302,162ms] [Info] [rtx.raytracing.plugin] Iteration: 0, global error = 0.000000 - total SPP reached.
2024-05-23 05:47:38 [302,173ms] [Info] [rtx.raytracing.plugin] Iteration: 0, global error = 0.000000 - total SPP reached.
2024-05-23 05:47:38 [302,220ms] [Error] [omni.syntheticdata.plugin] OgnSdStageSemanticInstanceMapping missing filteredLabelMap semLabelMap in the history.
2024-05-23 05:47:40 [304,870ms] [Info] [rtx.raytracing.plugin] Iteration: 0, global error = 0.000000 - total SPP reached.
2024-05-23 05:47:40 [304,881ms] [Info] [rtx.raytracing.plugin] Iteration: 0, global error = 0.000000 - total SPP reached.
2024-05-23 05:47:41 [304,928ms] [Error] [omni.syntheticdata.plugin] OgnSdStageSemanticInstanceMapping missing filteredLabelMap semLabelMap in the history.
2024-05-23 05:47:43 [307,576ms] [Info] [rtx.raytracing.plugin] Iteration: 0, global error = 0.000000 - total SPP reached.
2024-05-23 05:47:43 [307,584ms] [Info] [rtx.raytracing.plugin] Iteration: 0, global error = 0.000000 - total SPP reached.
2024-05-23 05:47:43 [307,615ms] [Error] [omni.syntheticdata.plugin] OgnSdStageSemanticInstanceMapping missing filteredLabelMap semLabelMap in the history.
2024-05-23 05:47:46 [310,277ms] [Info] [rtx.raytracing.plugin] Iteration: 0, global error = 0.000000 - total SPP reached.
2024-05-23 05:47:46 [310,289ms] [Info] [rtx.raytracing.plugin] Iteration: 0, global error = 0.000000 - total SPP reached.
2024-05-23 05:47:46 [310,328ms] [Error] [omni.syntheticdata.plugin] OgnSdStageSemanticInstanceMapping missing filteredLabelMap semLabelMap in the history.
2024-05-23 05:47:49 [312,988ms] [Info] [rtx.raytracing.plugin] Iteration: 0, global error = 0.000000 - total SPP reached.
2024-05-23 05:47:49 [312,997ms] [Info] [rtx.raytracing.plugin] Iteration: 0, global error = 0.000000 - total SPP reached.
2024-05-23 05:47:49 [313,032ms] [Error] [omni.syntheticdata.plugin] OgnSdStageSemanticInstanceMapping missing filteredLabelMap semLabelMap in the history.
2024-05-23 05:47:51 [315,711ms] [Info] [rtx.raytracing.plugin] Iteration: 0, global error = 0.000000 - total SPP reached.
2024-05-23 05:47:51 [315,725ms] [Info] [rtx.raytracing.plugin] Iteration: 0, global error = 0.000000 - total SPP reached.
2024-05-23 05:47:51 [315,763ms] [Error] [omni.syntheticdata.plugin] OgnSdStageSemanticInstanceMapping missing filteredLabelMap semLabelMap in the history.
2024-05-23 05:47:54 [318,402ms] [Info] [omni.usd.audio] resetting the animation timeline
2024-05-23 05:47:54 [318,427ms] [Info] [rtx.raytracing.plugin] Iteration: 0, global error = 0.000000 - total SPP reached.
2024-05-23 05:47:54 [318,444ms] [Info] [rtx.raytracing.plugin] Iteration: 0, global error = 0.000000 - total SPP reached.
2024-05-23 05:47:54 [318,489ms] [Error] [omni.syntheticdata.plugin] OgnSdStageSemanticInstanceMapping missing filteredLabelMap semLabelMap in the history.
2024-05-23 05:47:57 [321,147ms] [Info] [rtx.raytracing.plugin] Iteration: 0, global error = 0.000000 - total SPP reached.
2024-05-23 05:47:57 [321,158ms] [Info] [rtx.raytracing.plugin] Iteration: 0, global error = 0.000000 - total SPP reached.
2024-05-23 05:47:57 [321,194ms] [Error] [omni.syntheticdata.plugin] OgnSdStageSemanticInstanceMapping missing filteredLabelMap semLabelMap in the history.
2024-05-23 05:47:59 [323,833ms] [Info] [carb] Initializing plugin: carb.windowing-glfw.plugin (interfaces: [carb::windowing::IGLContext v1.0],[carb::windowing::IWindowing v1.4]) (impl: carb.windowing-glfw.plugin)
2024-05-23 05:47:59 [323,835ms] [Warning] [carb.windowing-glfw.plugin] GLFW initialization failed.
2024-05-23 05:47:59 [323,835ms] [Warning] [carb] Failed to startup plugin carb.windowing-glfw.plugin (interfaces: [carb::windowing::IGLContext v1.0],[carb::windowing::IWindowing v1.4]) (impl: carb.windowing-glfw.plugin)
2024-05-23 05:47:59 [323,857ms] [Info] [omni.physx.foundation.plugin] getCudaDeviceOrdinal: deviceIndex 0 used as offset into a deviceGroup. DeviceOrdinal is 0.
2024-05-23 05:47:59 [323,857ms] [Info] [omni.physx.plugin] Using CUDA device ordinal 0.
2024-05-23 05:47:59 [323,857ms] [Info] [omni.physx.plugin] Using CUDA device ordinal 0
2024-05-23 05:47:59 [323,858ms] [Info] [omni.usd.audio] resetting the animation timeline
2024-05-23 05:47:59 [323,859ms] [Info] [omni.kit.menu.utils.scripts.utils] omni.kit.menu.utils.rebuild_menus
2024-05-23 05:48:00 [324,321ms] [Info] [rtx.raytracing.plugin] Iteration: 0, global error = 0.000000 - total SPP reached.
2024-05-23 05:48:00 [324,328ms] [Info] [rtx.raytracing.plugin] Iteration: 0, global error = 0.000000 - total SPP reached.
2024-05-23 05:48:00 [324,359ms] [Error] [omni.syntheticdata.plugin] OgnSdStageSemanticInstanceMapping missing filteredLabelMap semLabelMap in the history.
2024-05-23 05:48:03 [327,007ms] [Info] [omni.kit.menu.utils.scripts.utils] omni.kit.menu.utils.rebuild_menus
2024-05-23 05:48:03 [327,205ms] [Info] [rtx.raytracing.plugin] Iteration: 0, global error = 0.000000 - total SPP reached.
2024-05-23 05:48:03 [327,211ms] [Info] [rtx.raytracing.plugin] Iteration: 0, global error = 0.000000 - total SPP reached.
2024-05-23 05:48:03 [327,244ms] [Error] [omni.syntheticdata.plugin] OgnSdStageSemanticInstanceMapping missing filteredLabelMap semLabelMap in the history.
2024-05-23 05:48:05 [329,885ms] [Info] [omni.isaac.synthetic_recorder.synthetic_recorder_extension] Overwriting config file /isaac-sim/exts/omni.isaac.synthetic_recorder/data/last_config.json.
2024-05-23 05:48:05 [329,885ms] [Info] [carb.crashreporter-breakpad.plugin] enabled: false
2024-05-23 05:48:05 [329,896ms] [Info] [omni.graph.exec.unstable.passRegistry] IPassRegistry::deregisterPass: deregistered ActionGraphPass
2024-05-23 05:48:05 [329,896ms] [Info] [omni.graph.exec.unstable.passRegistry] IPassRegistry::deregisterPass: deregistered DependencyFoldPass
2024-05-23 05:48:05 [329,897ms] [Warning] [omni.graph.core.plugin] Node type name '' is missing the unique namespace
2024-05-23 05:48:05 [329,899ms] [Warning] [carb] [Plugin: carb.taskagent.plugin] Module /isaac-sim/kit/exts/omni.taskagent/bin/deps/libcarb.taskagent.plugin.so remained loaded after unload request
2024-05-23 05:48:05 [329,900ms] [Warning] [carb] [Plugin: omni.spectree.delegate.plugin] Module /isaac-sim/kit/exts/omni.usd_resolver/bin/libomni.spectree.delegate.plugin.so remained loaded after unload request
2024-05-23 05:48:06 [329,956ms] [Info] [omni.graph.exec.unstable.passRegistry] IPassRegistry::deregisterPass: deregistered PassStronglyConnectedComponents
2024-05-23 05:48:06 [329,958ms] [Warning] [omni.core.ITypeFactory] Module /isaac-sim/kit/exts/omni.activity.core/bin/libomni.activity.core.plugin.so remained loaded after unload request.
command terminated with exit code 137

Posting related report

@samuel.hodge For the crash, would you be able to post the full log? How reproducible is this? Does it consistently crash at the same time? A small repro would be ideal.

For OgnSdStageSemanticInstanceMapping missing filteredLabelMap semLabelMap in the history do you have anything with a semantic label in the scene? (for now, I’m treating these as two separate things). Thanks for the pointer to the other post, I’ll follow up there after this to get more info.

You pinged me above about reporting to the driver team. Can you elaborate on what the driver issue is? This isn’t my area, and there’s a lot in this thread. Info on what the issue is, how you got it, and what’s expected would really help to get this to the right people. @Richard3D maybe you have a bit more context for driver issues?

1 Like

The error is consistent on the HPC cluster with driver 535 but not reproducible on my workstation with driver 545.

I will supply you with full logs from the three out of twenty scenes on which it occured.

you have had my source to create the sematic labels since Monday with the first post.

edit: 3 of the scenes where using more resources than the pod had available, I will try in Docker with unlimited resources and see if I get a good result.

Raising the amount of available DRAM by 20% fixes the issue, all apologies about the old man shouting at the sky