AODT RAN simulation fails

Hi community:
I am deploying AODT with Ubuntu 22.04.4. The nvidia-smi is:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.78                 Driver Version: 550.78         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA RTX A6000               Off |   00000000:04:00.0 Off |                  Off |
| 56%   81C    P2            253W /  300W |    5011MiB /  49140MiB |     86%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA RTX A6000               Off |   00000000:82:00.0 Off |                  Off |
| 30%   35C    P8             27W /  300W |    3712MiB /  49140MiB |     38%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA RTX A6000               Off |   00000000:83:00.0 Off |                  Off |
| 30%   34C    P8             19W /  300W |    3716MiB /  49140MiB |     41%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A   1392248    C+G   ...cal/share/ov/pkg/asim-1.0.0/kit/kit       4726MiB |
|    0   N/A  N/A   1415868      C   ./aodt_sim                                    260MiB |
|    1   N/A  N/A   1392248    C+G   ...cal/share/ov/pkg/asim-1.0.0/kit/kit       3692MiB |
|    2   N/A  N/A   1392248    C+G   ...cal/share/ov/pkg/asim-1.0.0/kit/kit       3696MiB |
+-----------------------------------------------------------------------------------------+

I trIed to validate EM simulation with Kyoto_flat.usd, which is illustrated in the figure below:

Then, I changed the scenario setting to RAN:
scenario property
The error message was:

Therefore, I modified the configuration like:

As soon as I start UE mobility, the worker lost connection.

Any idea about the problem? Could you please show me the correct configuration about panel, and the reason that worker lost connection since I encounter the same problem when I use Tokyo.usd for EM simulaion.

@guofachang The configuration is not the cause for the lost connection. You configuring the panels correctly-both panel_01 and panel_02 should be 4TR. If you lost connection, please restart the backend and try connecting again.

@guofachang I encountered the same problem previously. Please note that the default antenna in AODT is dual polarized, so you need to configure both panel1 and panel2 with a 2x1 array (or 1x2 array) configuration for the 4T4R requirement.

@junxian Thank you for your reply. Have you ever encountered “Worker has lost connection”?
I encounter this issue every time I start UE mobility in this situation, and I start UE mobility for EM mode in Tokyo.usd map.
I can’t pinpoint where the problem is, since the issue occurs in the same place.
Do you have any suggestions to resolve this issue?

@guofachang Based on my experience, an incorrect configuration of the RAN simulation can also result in a “Worker has lost connection” error. To diagnose this, you can first disable the RAN simulation to verify if it is the source of the problem. Additionally, you can check the console tab for more log messages or use the “docker logs” command to print the error messages from the backend.

@kpasad1 Hi, I verify the Worker has lost connection by controling Simulate RAN tag. I encountered this issue every time in Simulate Ran.

First, I disable the simulation RAN and set panel 01 and 02 the same like:
image

The EM simulation seems to work OK with Slot Mode:

Then, I enable the simulation RAN, pressed generate UE mobility and start UE mobility tag:


The problem comes up again

I checked the console. The total log of console are attached and the error was listed as follows:

I typied docker logs backend_bundle-connector-1, and the suspicious one may be:

[DEBUG] Got message type: "heartbeat_reply"
[INFO] Did not find attributes sim:duration/sim:interval with a positive value so using slot/symbol instead
[DEBUG] Scenario users: 1, batches: 1, slot/symbol mode: 1, slots_per_batch: 10, samples_per_slot: 1, duration (per batch): 0.005, interval: 0.0005, ue_min_speed=1.5, ue_max_speed=2.5, is_seeded=0, seed=0

====================================

TDD pattern: DDDDDDDDDDDDDDDDDDDD
gNB power      43.00 dBm
UE power       26.00 dBm
gNB antennas   4 
UE antennas    4 
DL HARQ        0 
UL HARQ        0 
====================================
terminate called after throwing an instance of 'cuphy::cuphy_fn_exception'
  what():  Function cuphyConvertTensor() returned CUPHY_STATUS_INTERNAL_ERROR: Internal error
Starting container...

Besides, when I opened AODT from command line:

administrator@ubuntu22:~/aodt_bundle$ ./omniverse-launcher-linux.AppImage 
06:10:13.853 ? Omniverse Launcher 1.9.11 (production)
06:10:13.874 ? Argv: /tmp/.mount_omnivetDmQWk/omniverse-launcher
06:10:13.875 ? Crash dumps directory: /home/administrator/.config/omniverse-launcher/Crashpad
06:10:13.883 ? Start polling Launcher updates.
06:10:14.150 ? Reset current installer.
06:10:14.167 ? Running production web server.
06:10:14.180 ? HTTP endpoints listening at http://localhost:33480
06:10:14.181 ? HTTP endpoints listening at http://127.0.0.1:33480
06:10:14.192 ? Sharing: false
06:10:14.332 ? Started the Navigator web server on 127.0.0.1:34080.
[80495:0806/061017.161214:ERROR:viz_main_impl.cc(186)] Exiting GPU process due to errors during initialization
[80653:0806/061020.220381:ERROR:angle_platform_impl.cc(43)] RendererVk.cpp:157 (VerifyExtensionsPresent): Extension not supported: VK_KHR_surface
ERR: RendererVk.cpp:157 (VerifyExtensionsPresent): Extension not supported: VK_KHR_surface
[80653:0806/061020.220589:ERROR:angle_platform_impl.cc(43)] RendererVk.cpp:157 (VerifyExtensionsPresent): Extension not supported: VK_KHR_xcb_surface
ERR: RendererVk.cpp:157 (VerifyExtensionsPresent): Extension not supported: VK_KHR_xcb_surface
[80653:0806/061020.220675:ERROR:angle_platform_impl.cc(43)] Display.cpp:1019 (initialize): ANGLE Display::initialize error 0: Internal Vulkan error (-7): A requested extension is not supported, in ../../third_party/angle/src/libANGLE/renderer/vulkan/RendererVk.cpp, enableInstanceExtensions:1639.
ERR: Display.cpp:1019 (initialize): ANGLE Display::initialize error 0: Internal Vulkan error (-7): A requested extension is not supported, in ../../third_party/angle/src/libANGLE/renderer/vulkan/RendererVk.cpp, enableInstanceExtensions:1639.
[80653:0806/061020.220729:ERROR:gl_display.cc(504)] EGL Driver message (Critical) eglInitialize: Internal Vulkan error (-7): A requested extension is not supported, in ../../third_party/angle/src/libANGLE/renderer/vulkan/RendererVk.cpp, enableInstanceExtensions:1639.
[80653:0806/061020.220786:ERROR:gl_display.cc(793)] eglInitialize SwANGLE failed with error EGL_NOT_INITIALIZED
[80653:0806/061020.220852:ERROR:gl_display.cc(819)] Initialization of all EGL display types failed.
[80653:0806/061020.220891:ERROR:gl_ozone_egl.cc(26)] GLDisplayEGL::Initialize failed.
[80653:0806/061023.247911:ERROR:angle_platform_impl.cc(43)] RendererVk.cpp:157 (VerifyExtensionsPresent): Extension not supported: VK_KHR_surface
ERR: RendererVk.cpp:157 (VerifyExtensionsPresent): Extension not supported: VK_KHR_surface
[80653:0806/061023.248075:ERROR:angle_platform_impl.cc(43)] RendererVk.cpp:157 (VerifyExtensionsPresent): Extension not supported: VK_KHR_xcb_surface
ERR: RendererVk.cpp:157 (VerifyExtensionsPresent): Extension not supported: VK_KHR_xcb_surface
[80653:0806/061023.248167:ERROR:angle_platform_impl.cc(43)] Display.cpp:1019 (initialize): ANGLE Display::initialize error 0: Internal Vulkan error (-7): A requested extension is not supported, in ../../third_party/angle/src/libANGLE/renderer/vulkan/RendererVk.cpp, enableInstanceExtensions:1639.
ERR: Display.cpp:1019 (initialize): ANGLE Display::initialize error 0: Internal Vulkan error (-7): A requested extension is not supported, in ../../third_party/angle/src/libANGLE/renderer/vulkan/RendererVk.cpp, enableInstanceExtensions:1639.
[80653:0806/061023.248218:ERROR:gl_display.cc(504)] EGL Driver message (Critical) eglInitialize: Internal Vulkan error (-7): A requested extension is not supported, in ../../third_party/angle/src/libANGLE/renderer/vulkan/RendererVk.cpp, enableInstanceExtensions:1639.
[80653:0806/061023.248286:ERROR:gl_display.cc(793)] eglInitialize SwANGLE failed with error EGL_NOT_INITIALIZED
[80653:0806/061023.248328:ERROR:gl_display.cc(819)] Initialization of all EGL display types failed.
[80653:0806/061023.248365:ERROR:gl_ozone_egl.cc(26)] GLDisplayEGL::Initialize failed.
[80653:0806/061023.250291:ERROR:viz_main_impl.cc(186)] Exiting GPU process due to errors during initialization
06:10:23.503 ? Saving omniverse-launcher.desktop file to /tmp/omniverse-launcher-2071tZ...
06:10:23.504 ? 
 [Desktop Entry]
Name=omniverse-launcher
Exec="/home/administrator/aodt_bundle/omniverse-launcher-linux.AppImage"  %u
Type=Application
Terminal=false
MimeType=x-scheme-handler/omniverse-launcher

06:10:23.738 ? Saving omniverse.desktop file to /tmp/omniverse-launcher-koyYvb...
06:10:23.739 ? 
 [Desktop Entry]
Name=omniverse-launcher
Exec="/home/administrator/aodt_bundle/omniverse-launcher-linux.AppImage"  %u
Type=Application
Terminal=false
MimeType=x-scheme-handler/omniverse

06:10:23.845 ? Initialized.
06:10:23.932 ? Logged in.
06:10:50.337 ? Running "/home/administrator/.local/share/ov/pkg/asim-1.0.0/aerial_sim.sh" --/app/environment/name='launcher'

My cpu is Intel(R) Xeon(R) CPU E5-2697 v4 @ 2.30GHz.

Can you provide any suggestions or comments? How to restart backend only?

Thank you.

kit_20240806_091855.log (732.5 KB)

@guofachang Your worker is crashing due to cuphy::cuphy_fn_exception. Since the worker crashes, the UI shows the lost connection message. Can you please share the ./src_be/components/common/config_ran.json file?
The error on the UI , during command line start up is possibly unrelated and I recommend you report it in a separate thread to avoid confusion.

Please also enable the debug logs and share . e.g.
OMNI_USER=omniverse OMNI_PASS=aerial_123456 ./build/aodt_sim --nucleus omniverse://<ip_addr>
–log debug

@kpasad1 Thank you for your reply. My config_ran.json is:

{
    "Cyclic prefix": 288,
    "gNB noise figure": 0.5,
    "UE noise figure": 0.5,
    "DL HARQ enabled": 0,
    "UL HARQ enabled": 0,
    "TDD patterns":{
        "1": "DDDDUUDDDD",
        "2": "DDDDDDDDDD",
        "3": "UUUUUUUUUU"
    },
    "Simulation pattern": 2,
    "Max scheduled UEs per TTI - dl": 6,
    "Max scheduled UEs per TTI - ul": 6
}

@kpasad1 May I ask at which directory level this command should be entered?

@guofachang your GPU is RTX A6000. If you look here : CUDA GPUs - Compute Capability | NVIDIA Developer, this GPU has a compute capability of 8.6. However, you are running a container corresponding to compute capability 8.9 (RTX 6000). We suspect that this is a problem.

You can run a container with compute capability 80. To do that:

  1. Copy backend_bundule/docker-compose.yml to another file. e.g. backend_bundule/docker-compose-sm80.yml

  2. Bring down the docker containers:
    > docker compose down

  3. edit the docker-compose-sm80.yml with a text editor and make the following change:
    image: nvcr.io/esee5uzbruax/aodt-sim:1.0.0_runtime_$GEN_CODE → image: nvcr.io/esee5uzbruax/aodt-sim:1.0.0_runtime_SM80

  4. Restart the container:
    >docker compose -f docker-compose-sm80.yml up

Let us know how it goes.

@kpasad1 Thank you for your reply. I will follow your steps to validate.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.