Isaac Sim Version:
4.2.0
Works with Isaac Sim Version 4.0.0
Operating System:
Ubuntu 22.04
GPU Information
- Model: NVIDIA L4
- Driver Version: 565.57.01
Topic Description
I am trying to run SDG scene on an EC2 instance (g6.12xlarge). When I use 2 GPUs, everything works fine, and SDG functions as expected. However, when I use 4 GPUs, Isaac Sim crashes. 4 GPUs worked with Isaac Sim 4.0.0
Error Messages
[74.025s] app ready
2024-12-03 16:27:38 [74,478ms] [Warning] [omni.kit.imgui_renderer.plugin] _createExtendCursor: No windowing.
2024-12-03 16:27:38 [74,478ms] [Warning] [omni.kit.imgui_renderer.plugin] _createExtendCursor: No windowing.
[75.632s] Simulation App Startup Complete
2024-12-03 16:27:53 [89,044ms] [Warning] [omni.hydra.scene_delegate.plugin] Calling getBypassRenderSkelMeshProcessing for prim /World/TableOutput.proto_collider_leg_01_id3 that has not been populated
2024-12-03 16:28:27 [122,836ms] [Warning] [omni.syntheticdata.plugin] OgnSdPostRenderVarToHost : rendervar copy from texture directly to host buffer is counter-performant. Please use copy from texture to device buffer first.
2024-12-03 16:28:31 [127,207ms] [Error] [carb.graphics-vulkan.plugin] GPU crash is detected. Shader debug is written into: /home/ubuntu/.local/share/ov/pkg/isaac-sim-4.2.0/kit/logs/Kit/Isaac-Sim/4.2/kit_20241203_162624-000072618313b250-0000724a819cdf90.nvdbg
2024-12-03 16:28:31 [127,209ms] [Error] [carb.graphics-vulkan.plugin] GPU crash is detected. Crash dump is written into: /home/ubuntu/.local/share/ov/pkg/isaac-sim-4.2.0/kit/logs/Kit/Isaac-Sim/4.2/kit_20241203_162624-0.nv-gpudmp
2024-12-03 16:28:31 [127,209ms] [Error] [carb.graphics-vulkan.plugin] GPU crash dump is successfully written
2024-12-03 16:29:27 [182,807ms] [Fatal] [rtx.scenedb.plugin] Waiting on Semaphore 6 for longer than 60s: Failure to complete CopyCommandList: Copy Context Geometry copy engine command list command list
2024-12-03 16:29:27 [182,899ms] [Fatal] [rtx.scenedb.plugin] Waiting on Semaphore 8 for longer than 60s: Failure to complete CopyCommandList: Copy Context Geometry copy engine command list command list
2024-12-03 16:29:27 [182,900ms] [Fatal] [rtx.scenedb.plugin] Waiting on Semaphore 9 for longer than 60s: Failure to complete CopyCommandList: Copy Context Geometry copy engine command list command list
Additional Information
|---------------------------------------------------------------------------------------------|
| Driver Version: 565.57.01 | Graphics API: Vulkan
|=============================================================================================|
| GPU | Name | Active | LDA | GPU Memory | Vendor-ID | LUID |
| | | | | | Device-ID | UUID |
| | | | | | Bus-ID | |
|---------------------------------------------------------------------------------------------|
| 0 | NVIDIA L4 | Yes: 0 | | 23034 MB | 10de | 0 |
| | | | | | 27b8 | c3475aba… |
| | | | | | 38 | |
|---------------------------------------------------------------------------------------------|
| 1 | NVIDIA L4 | Yes: 1 | | 23034 MB | 10de | 0 |
| | | | | | 27b8 | 10989674… |
| | | | | | 3a | |
|---------------------------------------------------------------------------------------------|
| 2 | NVIDIA L4 | | | 23034 MB | 10de | 0 |
| | | | | | 27b8 | 379e3e69… |
| | | | | | 3c | |
|---------------------------------------------------------------------------------------------|
| 3 | NVIDIA L4 | | | 23034 MB | 10de | 0 |
| | | | | | 27b8 | 4dad17a1… |
| | | | | | 3e | |
|=============================================================================================|
| OS: 22.04.5 LTS (Jammy Jellyfish) ubuntu, Version: 22.04.5, Kernel: 6.8.0-1019-aws
| Processor: AMD EPYC 7R13 Processor | Cores: 24 | Logical: 48
|---------------------------------------------------------------------------------------------|
| Total Memory (MB): 186124 | Free Memory: 164767
| Total Page/Swap (MB): 0 | Free Page/Swap: 0
|---------------------------------------------------------------------------------------------|