Here is the failure message:
aerial@mit-b32-gnb3:~/openairinterface5g/ci-scripts/yaml_files/sa_gh_gnb$ docker compose -f docker-compose-gnb.yaml up
WARN[0000] Found orphan containers ([oai-upf oai-smf oai-amf oai-ausf oai-udm oai-udr oai-ext-dn oai-nrf mysql asterisk-ims]) for this project. If you removed or renamed this service in your compose file, you can run this command with the --remove-orphans flag to clean it up.
[+] Running 2/0
✔ Container nv-cubb Created 0.0s
✔ Container c_oai-gnb-aerial Created 0.0s
Attaching to c_oai-gnb-aerial, nv-cubb
nv-cubb |
nv-cubb | ==========
nv-cubb | == CUDA ==
nv-cubb | ==========
nv-cubb |
nv-cubb | CUDA Version 12.6.2
nv-cubb |
nv-cubb | Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
nv-cubb |
nv-cubb | This container image and its contents are governed by the NVIDIA Deep Learning Container License.
nv-cubb | By pulling and using the container, you accept the terms and conditions of this license:
nv-cubb | https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
nv-cubb |
nv-cubb | A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.
nv-cubb |
nv-cubb | Cannot find MPS control daemon process
nv-cubb | Supermicro-G1SMH-G
nv-cubb | Started cuphycontroller on CPU core 2
nv-cubb | AERIAL_LOG_PATH set to /var/log/aerial
nv-cubb | Log file set to /var/log/aerial/phy.log
nv-cubb | Aerial metrics backend address: 127.0.0.1:8081
nv-cubb | 21:05:20.624345 WRN phy_init 0 [CTL.SCF] Config file: /opt/nvidia/cuBB/cuPHY-CP/cuphycontroller/config/cuphycontroller_P5G_FXN_GH.yaml
nv-cubb | 21:05:20.624730 WRN phy_init 0 [CTL.SCF] low_priority_core=10
nv-cubb | 21:05:20.624744 WRN phy_init 0 [APP.CONFIG] Current TAI offset: 0s
nv-cubb | 21:05:20.625037 WRN phy_init 0 [NVLOG.CPP] Using /opt/nvidia/cuBB/cuPHY/nvlog/config/nvlog_config.yaml for nvlog configuration
nv-cubb | 21:05:20.625051 WRN phy_init 0 [NVLOG.CPP] Output log file path /var/log/aerial/phy.log
nv-cubb | YAML invalid key: enable_l1_param_sanity_check Using default value of 0 to YAML_PARAM_ENABLE_L1_PARAM_SANITY_CHECK
nv-cubb | YAML invalid key: pmu_metrics Using default value of 0 to YAML_PARAM_PMU_METRICS
nv-cubb | YAML invalid key: ul_order_max_rx_pkts Using default value of 512 to UL_ORDER_MAX_RX_PKTS
nv-cubb | YAML invalid key: ul_order_rx_pkts_timeout_ns Using default value of 100us to YAML_PARAM_UL_ORDER_RX_PKTS_TIMEOUT_NS
nv-cubb | 21:05:20.649981 FATAL exit: Thread [phy_init] on core 10 file /opt/nvidia/cuBB/cuPHY/src/cuphy/cuphy_pti.cpp line 46: additional info: CUDA Runtime Error: {}:{}:{}
nv-cubb | 21:05:20.636389 WRN phy_init 0 [CTL.YAML] cuphycontroller config. yaml does not have gpu_init_comms_via_cpu key; defaulting to 0.
nv-cubb | 21:05:20.636390 WRN phy_init 0 [CTL.YAML] cuphycontroller config. yaml does not have cpu_init_comms key; defaulting to 0.
nv-cubb | 21:05:20.636496 WRN phy_init 0 [CTL.YAML] cuphycontroller config. yaml does not have pusch_workCancelMode key (experimental feature); defaulting to 0.
nv-cubb | 21:05:20.636549 WRN phy_init 0 [CTL.YAML] cell_id 1 nic_index :0
nv-cubb | 21:05:20.636645 WRN phy_init 0 [CTL.YAML] Num Slots: 8
nv-cubb | 21:05:20.636646 WRN phy_init 0 [CTL.YAML] Enable UL cuPHY Graphs: 1
nv-cubb | 21:05:20.636646 WRN phy_init 0 [CTL.YAML] Enable DL cuPHY Graphs: 1
nv-cubb | 21:05:20.636646 WRN phy_init 0 [CTL.YAML] Accurate TX scheduling clock resolution (ns): 500
nv-cubb | 21:05:20.636647 WRN phy_init 0 [CTL.YAML] DPDK core: 10
nv-cubb | 21:05:20.636647 WRN phy_init 0 [CTL.YAML] Prometheus core: -1
nv-cubb | 21:05:20.636647 WRN phy_init 0 [CTL.YAML] UL cores:
nv-cubb | 21:05:20.636647 WRN phy_init 0 [CTL.YAML] - 4
nv-cubb | 21:05:20.636647 WRN phy_init 0 [CTL.YAML] - 5
nv-cubb | 21:05:20.636647 WRN phy_init 0 [CTL.YAML] DL cores:
nv-cubb | 21:05:20.636647 WRN phy_init 0 [CTL.YAML] - 6
nv-cubb | 21:05:20.636647 WRN phy_init 0 [CTL.YAML] - 7
nv-cubb | 21:05:20.636647 WRN phy_init 0 [CTL.YAML] - 8
nv-cubb | 21:05:20.636648 WRN phy_init 0 [CTL.YAML] Debug worker: -1
nv-cubb | 21:05:20.636648 WRN phy_init 0 [CTL.YAML] Data Lake core: -1
nv-cubb | 21:05:20.636648 WRN phy_init 0 [CTL.YAML] SRS starting Section ID: 3072
nv-cubb | 21:05:20.636648 WRN phy_init 0 [CTL.YAML] PRACH starting Section ID: 2048
nv-cubb | 21:05:20.636648 WRN phy_init 0 [CTL.YAML] USE GREEN CONTEXTS: 0
nv-cubb | 21:05:20.636648 WRN phy_init 0 [CTL.YAML] MPS SM PUSCH: 82
nv-cubb | 21:05:20.636648 WRN phy_init 0 [CTL.YAML] MPS SM PUCCH: 20
nv-cubb | 21:05:20.636648 WRN phy_init 0 [CTL.YAML] MPS SM PRACH: 2
nv-cubb | 21:05:20.636648 WRN phy_init 0 [CTL.YAML] MPS SM UL ORDER: 20
nv-cubb | 21:05:20.636648 WRN phy_init 0 [CTL.YAML] MPS SM PDSCH: 102
nv-cubb | 21:05:20.636648 WRN phy_init 0 [CTL.YAML] MPS SM PDCCH: 10
nv-cubb | 21:05:20.636649 WRN phy_init 0 [CTL.YAML] MPS SM PBCH: 2
nv-cubb | 21:05:20.636649 WRN phy_init 0 [CTL.YAML] MPS SM GPU_COMMS: 16
nv-cubb | 21:05:20.636649 WRN phy_init 0 [CTL.YAML] PDSCH fallback: 0
nv-cubb | 21:05:20.636649 WRN phy_init 0 [CTL.YAML] Massive MIMO enable: 0
nv-cubb | 21:05:20.636649 WRN phy_init 0 [CTL.YAML] Enable SRS : 1
nv-cubb | 21:05:20.636649 WRN phy_init 0 [CTL.YAML] ul_order_timeout_gpu_log_enable: 0
nv-cubb | 21:05:20.636649 WRN phy_init 0 [CTL.YAML] ue_mode: 0
nv-cubb | 21:05:20.636649 WRN phy_init 0 [CTL.YAML] Aggr Obj Non-availability threshold: 5
nv-cubb | 21:05:20.636650 WRN phy_init 0 [CTL.YAML] sendCPlane_timing_error_th_ns: 0
nv-cubb | 21:05:20.636650 WRN phy_init 0 [CTL.YAML] pusch_aggr_per_ctx: 3
nv-cubb | 21:05:20.636650 WRN phy_init 0 [CTL.YAML] prach_aggr_per_ctx: 2
nv-cubb | 21:05:20.636650 WRN phy_init 0 [CTL.YAML] pucch_aggr_per_ctx: 4
nv-cubb | 21:05:20.636650 WRN phy_init 0 [CTL.YAML] srs_aggr_per_ctx: 3
nv-cubb | 21:05:20.636650 WRN phy_init 0 [CTL.YAML] max_harq_pools: 384
nv-cubb | 21:05:20.636650 WRN phy_init 0 [CTL.YAML] ul_input_buffer_per_cell: 10
nv-cubb | 21:05:20.636650 WRN phy_init 0 [CTL.YAML] ul_input_buffer_per_cell_srs: 6
nv-cubb | 21:05:20.636651 WRN phy_init 0 [CTL.YAML] max_ru_unhealthy_ul_slots: 0
nv-cubb | 21:05:20.636651 WRN phy_init 0 [CTL.YAML] srs_chest_algo_type: 0
nv-cubb | 21:05:20.636651 WRN phy_init 0 [CTL.YAML] ul_order_timeout_gpu_log_enable: 0
nv-cubb | 21:05:20.636651 WRN phy_init 0 [CTL.YAML] pusch_workCancelMode: 0
nv-cubb | 21:05:20.636651 WRN phy_init 0 [CTL.YAML] GPU-initiated comms DL: 1
nv-cubb | 21:05:20.636651 WRN phy_init 0 [CTL.YAML] GPU-initiated comms (via CPU): 0
nv-cubb | 21:05:20.636651 WRN phy_init 0 [CTL.YAML] CPU-initiated comms : 0
nv-cubb | 21:05:20.636651 WRN phy_init 0 [CTL.YAML] Cell group: 1
nv-cubb | 21:05:20.636651 WRN phy_init 0 [CTL.YAML] Cell group num: 1
nv-cubb | 21:05:20.636651 WRN phy_init 0 [CTL.YAML] puxchPolarDcdrListSz: 8
nv-cubb | 21:05:20.636651 WRN phy_init 0 [CTL.YAML] split_ul_cuda_streams: 0
nv-cubb | 21:05:20.636651 WRN phy_init 0 [CTL.YAML] serialize_pucch_pusch: 0
nv-cubb | 21:05:20.636651 WRN phy_init 0 [CTL.YAML] Number of Cell Configs: 1
nv-cubb | 21:05:20.636651 WRN phy_init 0 [CTL.YAML] L2Adapter config file: /opt/nvidia/cuBB/cuPHY-CP/cuphycontroller/config/l2_adapter_config_P5G_GH.yaml
nv-cubb | 21:05:20.636651 WRN phy_init 0 [CTL.YAML] Cell name: O-RU 0
nv-cubb | 21:05:20.636652 WRN phy_init 0 [CTL.YAML] MU: 1
nv-cubb | 21:05:20.636652 WRN phy_init 0 [CTL.YAML] ID: 1
nv-cubb | 21:05:20.636652 WRN phy_init 0 [CTL.YAML] Number of MPlane Configs: 1
nv-cubb | 21:05:20.636652 WRN phy_init 0 [CTL.YAML] Mplane ID: 1
nv-cubb | 21:05:20.636652 WRN phy_init 0 [CTL.YAML] VLAN ID: 2
nv-cubb | 21:05:20.636652 WRN phy_init 0 [CTL.YAML] Source Eth Address: 00:00:00:00:00:00
nv-cubb | 21:05:20.636652 WRN phy_init 0 [CTL.YAML] Destination Eth Address: 6c:ad:ad:00:0c:40
nv-cubb | 21:05:20.636652 WRN phy_init 0 [CTL.YAML] NIC port: 0000:01:00.0
nv-cubb | 21:05:20.636653 WRN phy_init 0 [CTL.YAML] RU Type: 1
nv-cubb | 21:05:20.636653 WRN phy_init 0 [CTL.YAML] U-plane TXQs: 1
nv-cubb | 21:05:20.636653 WRN phy_init 0 [CTL.YAML] DL compression method: 1
nv-cubb | 21:05:20.636653 WRN phy_init 0 [CTL.YAML] DL iq bit width: 9
nv-cubb | 21:05:20.636653 WRN phy_init 0 [CTL.YAML] UL compression method: 1
nv-cubb | 21:05:20.636653 WRN phy_init 0 [CTL.YAML] UL iq bit width: 9
nv-cubb | 21:05:20.636653 WRN phy_init 0 [CTL.YAML]
nv-cubb | 21:05:20.636653 WRN phy_init 0 [CTL.YAML] Flow list SSB/PBCH:
nv-cubb | 21:05:20.636653 WRN phy_init 0 [CTL.YAML] 0
nv-cubb | 21:05:20.636654 WRN phy_init 0 [CTL.YAML] 1
nv-cubb | 21:05:20.636654 WRN phy_init 0 [CTL.YAML] 2
nv-cubb | 21:05:20.636654 WRN phy_init 0 [CTL.YAML] 3
nv-cubb | 21:05:20.636654 WRN phy_init 0 [CTL.YAML] Flow list PDCCH:
nv-cubb | 21:05:20.636654 WRN phy_init 0 [CTL.YAML] 0
nv-cubb | 21:05:20.636654 WRN phy_init 0 [CTL.YAML] 1
nv-cubb | 21:05:20.636654 WRN phy_init 0 [CTL.YAML] 2
nv-cubb | 21:05:20.636654 WRN phy_init 0 [CTL.YAML] 3
nv-cubb | 21:05:20.636654 WRN phy_init 0 [CTL.YAML] Flow list PDSCH:
nv-cubb | 21:05:20.636654 WRN phy_init 0 [CTL.YAML] 0
nv-cubb | 21:05:20.636655 WRN phy_init 0 [CTL.YAML] 1
nv-cubb | 21:05:20.636655 WRN phy_init 0 [CTL.YAML] 2
nv-cubb | 21:05:20.636655 WRN phy_init 0 [CTL.YAML] 3
nv-cubb | 21:05:20.636655 WRN phy_init 0 [CTL.YAML] Flow list CSIRS:
nv-cubb | 21:05:20.636655 WRN phy_init 0 [CTL.YAML] 0
nv-cubb | 21:05:20.636655 WRN phy_init 0 [CTL.YAML] 1
nv-cubb | 21:05:20.636655 WRN phy_init 0 [CTL.YAML] 2
nv-cubb | 21:05:20.636655 WRN phy_init 0 [CTL.YAML] 3
nv-cubb | 21:05:20.636655 WRN phy_init 0 [CTL.YAML] Flow list PUSCH:
nv-cubb | 21:05:20.636655 WRN phy_init 0 [CTL.YAML] 0
nv-cubb | 21:05:20.636655 WRN phy_init 0 [CTL.YAML] 1
nv-cubb | 21:05:20.636655 WRN phy_init 0 [CTL.YAML] 2
nv-cubb | 21:05:20.636655 WRN phy_init 0 [CTL.YAML] 3
nv-cubb | 21:05:20.636655 WRN phy_init 0 [CTL.YAML] Flow list PUCCH:
nv-cubb | 21:05:20.636655 WRN phy_init 0 [CTL.YAML] 0
nv-cubb | 21:05:20.636655 WRN phy_init 0 [CTL.YAML] 1
nv-cubb | 21:05:20.636655 WRN phy_init 0 [CTL.YAML] 2
nv-cubb | 21:05:20.636655 WRN phy_init 0 [CTL.YAML] 3
nv-cubb | 21:05:20.636655 WRN phy_init 0 [CTL.YAML] Flow list SRS:
nv-cubb | 21:05:20.636655 WRN phy_init 0 [CTL.YAML] 8
nv-cubb | 21:05:20.636656 WRN phy_init 0 [CTL.YAML] 9
nv-cubb | 21:05:20.636656 WRN phy_init 0 [CTL.YAML] 10
nv-cubb | 21:05:20.636656 WRN phy_init 0 [CTL.YAML] 11
nv-cubb | 21:05:20.636656 WRN phy_init 0 [CTL.YAML] Flow list PRACH:
nv-cubb | 21:05:20.636656 WRN phy_init 0 [CTL.YAML] 4
nv-cubb | 21:05:20.636656 WRN phy_init 0 [CTL.YAML] 5
nv-cubb | 21:05:20.636656 WRN phy_init 0 [CTL.YAML] 6
nv-cubb | 21:05:20.636656 WRN phy_init 0 [CTL.YAML] 7
nv-cubb | 21:05:20.636656 WRN phy_init 0 [CTL.YAML] PUSCH TV: /opt/nvidia/cuBB/testVectors/cuPhyChEstCoeffs.h5
nv-cubb | 21:05:20.636656 WRN phy_init 0 [CTL.YAML] SRS TV: /opt/nvidia/cuBB/testVectors/cuPhyChEstCoeffs.h5
nv-cubb | 21:05:20.636656 WRN phy_init 0 [CTL.YAML] Section_3 time offset: 58369
nv-cubb | 21:05:20.636656 WRN phy_init 0 [CTL.YAML] nMaxRxAnt: 4
nv-cubb | 21:05:20.636656 WRN phy_init 0 [CTL.YAML] PUSCH PRBs Stride: 273
nv-cubb | 21:05:20.636656 WRN phy_init 0 [CTL.YAML] PRACH PRBs Stride: 12
nv-cubb | 21:05:20.636656 WRN phy_init 0 [CTL.YAML] SRS PRBs Stride: 273
nv-cubb | 21:05:20.636656 WRN phy_init 0 [CTL.YAML] PUSCH nMaxPrb: 273
nv-cubb | 21:05:20.636657 WRN phy_init 0 [CTL.YAML] PUSCH nMaxRx: 4
nv-cubb | 21:05:20.636657 WRN phy_init 0 [CTL.YAML] UL Gain Calibration: 78.68
nv-cubb | 21:05:20.636657 WRN phy_init 0 [CTL.YAML] Lower guard bw: 845
nv-cubb | 21:05:20.649966 ERR phy_init 0 [AERIAL_INTERNAL_EVENT] [CUPHY.PTI] CUDA Runtime Error: /opt/nvidia/cuBB/cuPHY/src/cuphy/cuphy_pti.cpp:46:MPS client failed to connect to the MPS control daemon or the MPS server
nv-cubb | 21:05:20.649993 ERR phy_init 0 [AERIAL_SYSTEM_API_EVENT] [NVLOG.EXIT_HANDLER] FATAL exit: Thread [phy_init] on core 10 file /opt/nvidia/cuBB/cuPHY/src/cuphy/cuphy_pti.cpp line 46: additional info: CUDA Runtime Error: {}:{}:{}
nv-cubb | Stack trace (most recent call last):
nv-cubb | #7 Object "/usr/lib/aarch64-linux-gnu/ld-linux-aarch64.so.1", at 0xffffffffffffffff, in
nv-cubb | #6 Object "/opt/nvidia/cuBB/build/cuPHY-CP/cuphycontroller/examples/cuphycontroller_scf", at 0x41276f, in _start
nv-cubb | #5 Object "/usr/lib/aarch64-linux-gnu/libc.so.6", at 0xe972995674cb, in __libc_start_main
nv-cubb | #4 Object "/usr/lib/aarch64-linux-gnu/libc.so.6", at 0xe972995673fb, in
nv-cubb | #3 Object "/opt/nvidia/cuBB/build/cuPHY-CP/cuphycontroller/examples/cuphycontroller_scf", at 0x40f873, in main
nv-cubb | #2 Object "/opt/nvidia/cuBB/build/cuPHY/src/cuphy/libcuphy.so", at 0xe972aa9812ab, in cuphy_pti_init
nv-cubb | #1 Object "/opt/nvidia/cuBB/build/cuPHY/nvlog/libnvlog.so", at 0xe972999ccabb, in exit_handler::test_trigger_exit(char const*, int, char const*)
nv-cubb | #0 Source "/opt/nvidia/cuBB/cuPHY-CP/cuphydriver/src/common/cuphydriver_api.cpp", line 2773, in l1_exit_handler
nv-cubb | 2770: //PhyDriver initialization failure
nv-cubb | 2771: if(l1_getPhydriverHandle() == nullptr)
nv-cubb | 2772: {
nv-cubb | >2773: AERIAL_PRINT_BACKTRACE(32ULL);
nv-cubb | 2774: exit(EXIT_FAILURE); //Exit immediately
nv-cubb | 2775: }
nv-cubb | 21:05:20.750061 WRN phy_init 0 [DRV.API] Trigging L1 exit handler
nv-cubb | [C]: Usage: ./build/cuPHY-CP/gt_common_libs/nvIPC/tests/pcap/pcap_collect <name> <destination path>
nv-cubb |
nv-cubb | [C]: Current run: ./build/cuPHY-CP/gt_common_libs/nvIPC/tests/pcap/pcap_collect name=nvipc dest_path=/var/log/aerial
nv-cubb |
nv-cubb | [I]: shmlogger_collect: save /var/log/aerial/nvipc_pcap and /dev/shm/nvipc_pcap logs to /var/log/aerial/nvipc_pcap
nv-cubb | [E][AERIAL_SYSTEM_API_EVENT]: ipc_shm_open: shm_open nvipc_pcap failed error -1
nv-cubb | [E][AERIAL_NVIPC_API_EVENT]: nv_ipc_shm_open: primary=0 name=nvipc_pcap size=8388680 Failed
nv-cubb | [E][AERIAL_SYSTEM_API_EVENT]: ipc_shm_close: close shm_fd failed
nv-cubb | [E][AERIAL_NVIPC_API_EVENT]: shmlogger_open: nv_ipc_shm_open failed
nv-cubb | [I]: shmlogger_collect: no /dev/shm/nvipc_pcap, logger may have been closed normally
Gracefully stopping... (press Ctrl+C again to force)
dependency failed to start: container nv-cubb exited (0)
Please provide any feedback.
Hi @subhams ,
Please initiate MPS service before starting cuphycontroller. You can find the instructions here .
This should only be run on the cuphycontroller terminal and not for test_mac.
Thanks.
The error repeats after following these steps:
# Export variables
export CUDA_DEVICE_MAX_CONNECTIONS=8
export CUDA_MPS_PIPE_DIRECTORY=/var
export CUDA_MPS_LOG_DIRECTORY=/var
# Stop existing MPS
sudo -E echo quit | sudo -E nvidia-cuda-mps-control
# Start MPS
sudo -E nvidia-cuda-mps-control -d
sudo -E echo start_server -uid 0 | sudo -E nvidia-cuda-mps-control
Here is the error log:
aerial@mit-b32-gnb3:~/openairinterface5g/ci-scripts/yaml_files/sa_gh_gnb$ docker compose -f docker-compose-gnb.yaml up
WARN[0000] Found orphan containers ([oai-upf oai-smf oai-amf oai-ausf oai-udm oai-udr oai-ext-dn oai-nrf mysql asterisk-ims]) for this project. If you removed or renamed this service in your compose file, you can run this command with the --remove-orphans flag to clean it up.
[+] Running 1/0
✔ Container nv-cubb Created 0.0s
Attaching to c_oai-gnb-aerial, nv-cubb
nv-cubb |
nv-cubb | ==========
nv-cubb | == CUDA ==
nv-cubb | ==========
nv-cubb |
nv-cubb | CUDA Version 12.6.2
nv-cubb |
nv-cubb | Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
nv-cubb |
nv-cubb | This container image and its contents are governed by the NVIDIA Deep Learning Container License.
nv-cubb | By pulling and using the container, you accept the terms and conditions of this license:
nv-cubb | https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
nv-cubb |
nv-cubb | A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.
nv-cubb |
nv-cubb | Cannot find MPS control daemon process
nv-cubb | Supermicro-G1SMH-G
nv-cubb | Started cuphycontroller on CPU core 69
nv-cubb | AERIAL_LOG_PATH set to /var/log/aerial
nv-cubb | Log file set to /var/log/aerial/phy.log
nv-cubb | Aerial metrics backend address: 127.0.0.1:8081
nv-cubb | 23:12:56.432324 WRN phy_init 0 [CTL.SCF] Config file: /opt/nvidia/cuBB/cuPHY-CP/cuphycontroller/config/cuphycontroller_P5G_FXN_GH.yaml
nv-cubb | 23:12:56.432725 WRN phy_init 0 [CTL.SCF] low_priority_core=10
nv-cubb | 23:12:56.432739 WRN phy_init 0 [APP.CONFIG] Current TAI offset: 0s
nv-cubb | 23:12:56.432954 WRN phy_init 0 [NVLOG.CPP] Using /opt/nvidia/cuBB/cuPHY/nvlog/config/nvlog_config.yaml for nvlog configuration
nv-cubb | 23:12:56.432967 WRN phy_init 0 [NVLOG.CPP] Output log file path /var/log/aerial/phy.log
nv-cubb | YAML invalid key: enable_l1_param_sanity_check Using default value of 0 to YAML_PARAM_ENABLE_L1_PARAM_SANITY_CHECK
nv-cubb | YAML invalid key: pmu_metrics Using default value of 0 to YAML_PARAM_PMU_METRICS
nv-cubb | YAML invalid key: ul_order_max_rx_pkts Using default value of 512 to UL_ORDER_MAX_RX_PKTS
nv-cubb | YAML invalid key: ul_order_rx_pkts_timeout_ns Using default value of 100us to YAML_PARAM_UL_ORDER_RX_PKTS_TIMEOUT_NS
nv-cubb | 23:12:56.457426 FATAL exit: Thread [phy_init] on core 10 file /opt/nvidia/cuBB/cuPHY/src/cuphy/cuphy_pti.cpp line 46: additional info: CUDA Runtime Error: {}:{}:{}
nv-cubb | 23:12:56.444264 WRN phy_init 0 [CTL.YAML] cuphycontroller config. yaml does not have gpu_init_comms_via_cpu key; defaulting to 0.
nv-cubb | 23:12:56.444265 WRN phy_init 0 [CTL.YAML] cuphycontroller config. yaml does not have cpu_init_comms key; defaulting to 0.
nv-cubb | 23:12:56.444368 WRN phy_init 0 [CTL.YAML] cuphycontroller config. yaml does not have pusch_workCancelMode key (experimental feature); defaulting to 0.
nv-cubb | 23:12:56.444415 WRN phy_init 0 [CTL.YAML] cell_id 1 nic_index :0
nv-cubb | 23:12:56.444507 WRN phy_init 0 [CTL.YAML] Num Slots: 8
nv-cubb | 23:12:56.444507 WRN phy_init 0 [CTL.YAML] Enable UL cuPHY Graphs: 1
nv-cubb | 23:12:56.444507 WRN phy_init 0 [CTL.YAML] Enable DL cuPHY Graphs: 1
nv-cubb | 23:12:56.444507 WRN phy_init 0 [CTL.YAML] Accurate TX scheduling clock resolution (ns): 500
nv-cubb | 23:12:56.444508 WRN phy_init 0 [CTL.YAML] DPDK core: 10
nv-cubb | 23:12:56.444508 WRN phy_init 0 [CTL.YAML] Prometheus core: -1
nv-cubb | 23:12:56.444508 WRN phy_init 0 [CTL.YAML] UL cores:
nv-cubb | 23:12:56.444508 WRN phy_init 0 [CTL.YAML] - 4
nv-cubb | 23:12:56.444508 WRN phy_init 0 [CTL.YAML] - 5
nv-cubb | 23:12:56.444508 WRN phy_init 0 [CTL.YAML] DL cores:
nv-cubb | 23:12:56.444509 WRN phy_init 0 [CTL.YAML] - 6
nv-cubb | 23:12:56.444509 WRN phy_init 0 [CTL.YAML] - 7
nv-cubb | 23:12:56.444509 WRN phy_init 0 [CTL.YAML] - 8
nv-cubb | 23:12:56.444509 WRN phy_init 0 [CTL.YAML] Debug worker: -1
nv-cubb | 23:12:56.444509 WRN phy_init 0 [CTL.YAML] Data Lake core: -1
nv-cubb | 23:12:56.444509 WRN phy_init 0 [CTL.YAML] SRS starting Section ID: 3072
nv-cubb | 23:12:56.444510 WRN phy_init 0 [CTL.YAML] PRACH starting Section ID: 2048
nv-cubb | 23:12:56.444510 WRN phy_init 0 [CTL.YAML] USE GREEN CONTEXTS: 0
nv-cubb | 23:12:56.444510 WRN phy_init 0 [CTL.YAML] MPS SM PUSCH: 82
nv-cubb | 23:12:56.444510 WRN phy_init 0 [CTL.YAML] MPS SM PUCCH: 20
nv-cubb | 23:12:56.444510 WRN phy_init 0 [CTL.YAML] MPS SM PRACH: 2
nv-cubb | 23:12:56.444510 WRN phy_init 0 [CTL.YAML] MPS SM UL ORDER: 20
nv-cubb | 23:12:56.444510 WRN phy_init 0 [CTL.YAML] MPS SM PDSCH: 102
nv-cubb | 23:12:56.444510 WRN phy_init 0 [CTL.YAML] MPS SM PDCCH: 10
nv-cubb | 23:12:56.444510 WRN phy_init 0 [CTL.YAML] MPS SM PBCH: 2
nv-cubb | 23:12:56.444510 WRN phy_init 0 [CTL.YAML] MPS SM GPU_COMMS: 16
nv-cubb | 23:12:56.444510 WRN phy_init 0 [CTL.YAML] PDSCH fallback: 0
nv-cubb | 23:12:56.444511 WRN phy_init 0 [CTL.YAML] Massive MIMO enable: 0
nv-cubb | 23:12:56.444511 WRN phy_init 0 [CTL.YAML] Enable SRS : 1
nv-cubb | 23:12:56.444511 WRN phy_init 0 [CTL.YAML] ul_order_timeout_gpu_log_enable: 0
nv-cubb | 23:12:56.444511 WRN phy_init 0 [CTL.YAML] ue_mode: 0
nv-cubb | 23:12:56.444511 WRN phy_init 0 [CTL.YAML] Aggr Obj Non-availability threshold: 5
nv-cubb | 23:12:56.444512 WRN phy_init 0 [CTL.YAML] sendCPlane_timing_error_th_ns: 0
nv-cubb | 23:12:56.444512 WRN phy_init 0 [CTL.YAML] pusch_aggr_per_ctx: 3
nv-cubb | 23:12:56.444512 WRN phy_init 0 [CTL.YAML] prach_aggr_per_ctx: 2
nv-cubb | 23:12:56.444512 WRN phy_init 0 [CTL.YAML] pucch_aggr_per_ctx: 4
nv-cubb | 23:12:56.444512 WRN phy_init 0 [CTL.YAML] srs_aggr_per_ctx: 3
nv-cubb | 23:12:56.444512 WRN phy_init 0 [CTL.YAML] max_harq_pools: 384
nv-cubb | 23:12:56.444512 WRN phy_init 0 [CTL.YAML] ul_input_buffer_per_cell: 10
nv-cubb | 23:12:56.444512 WRN phy_init 0 [CTL.YAML] ul_input_buffer_per_cell_srs: 6
nv-cubb | 23:12:56.444513 WRN phy_init 0 [CTL.YAML] max_ru_unhealthy_ul_slots: 0
nv-cubb | 23:12:56.444513 WRN phy_init 0 [CTL.YAML] srs_chest_algo_type: 0
nv-cubb | 23:12:56.444513 WRN phy_init 0 [CTL.YAML] ul_order_timeout_gpu_log_enable: 0
nv-cubb | 23:12:56.444513 WRN phy_init 0 [CTL.YAML] pusch_workCancelMode: 0
nv-cubb | 23:12:56.444513 WRN phy_init 0 [CTL.YAML] GPU-initiated comms DL: 1
nv-cubb | 23:12:56.444513 WRN phy_init 0 [CTL.YAML] GPU-initiated comms (via CPU): 0
nv-cubb | 23:12:56.444513 WRN phy_init 0 [CTL.YAML] CPU-initiated comms : 0
nv-cubb | 23:12:56.444513 WRN phy_init 0 [CTL.YAML] Cell group: 1
nv-cubb | 23:12:56.444513 WRN phy_init 0 [CTL.YAML] Cell group num: 1
nv-cubb | 23:12:56.444513 WRN phy_init 0 [CTL.YAML] puxchPolarDcdrListSz: 8
nv-cubb | 23:12:56.444514 WRN phy_init 0 [CTL.YAML] split_ul_cuda_streams: 0
nv-cubb | 23:12:56.444514 WRN phy_init 0 [CTL.YAML] serialize_pucch_pusch: 0
nv-cubb | 23:12:56.444514 WRN phy_init 0 [CTL.YAML] Number of Cell Configs: 1
nv-cubb | 23:12:56.444514 WRN phy_init 0 [CTL.YAML] L2Adapter config file: /opt/nvidia/cuBB/cuPHY-CP/cuphycontroller/config/l2_adapter_config_P5G_GH.yaml
nv-cubb | 23:12:56.444514 WRN phy_init 0 [CTL.YAML] Cell name: O-RU 0
nv-cubb | 23:12:56.444514 WRN phy_init 0 [CTL.YAML] MU: 1
nv-cubb | 23:12:56.444514 WRN phy_init 0 [CTL.YAML] ID: 1
nv-cubb | 23:12:56.444514 WRN phy_init 0 [CTL.YAML] Number of MPlane Configs: 1
nv-cubb | 23:12:56.444515 WRN phy_init 0 [CTL.YAML] Mplane ID: 1
nv-cubb | 23:12:56.444515 WRN phy_init 0 [CTL.YAML] VLAN ID: 2
nv-cubb | 23:12:56.444515 WRN phy_init 0 [CTL.YAML] Source Eth Address: 00:00:00:00:00:00
nv-cubb | 23:12:56.444515 WRN phy_init 0 [CTL.YAML] Destination Eth Address: 6c:ad:ad:00:0c:40
nv-cubb | 23:12:56.444515 WRN phy_init 0 [CTL.YAML] NIC port: 0000:01:00.0
nv-cubb | 23:12:56.444515 WRN phy_init 0 [CTL.YAML] RU Type: 1
nv-cubb | 23:12:56.444516 WRN phy_init 0 [CTL.YAML] U-plane TXQs: 1
nv-cubb | 23:12:56.444516 WRN phy_init 0 [CTL.YAML] DL compression method: 1
nv-cubb | 23:12:56.444516 WRN phy_init 0 [CTL.YAML] DL iq bit width: 9
nv-cubb | 23:12:56.444516 WRN phy_init 0 [CTL.YAML] UL compression method: 1
nv-cubb | 23:12:56.444516 WRN phy_init 0 [CTL.YAML] UL iq bit width: 9
nv-cubb | 23:12:56.444516 WRN phy_init 0 [CTL.YAML]
nv-cubb | 23:12:56.444516 WRN phy_init 0 [CTL.YAML] Flow list SSB/PBCH:
nv-cubb | 23:12:56.444517 WRN phy_init 0 [CTL.YAML] 0
nv-cubb | 23:12:56.444517 WRN phy_init 0 [CTL.YAML] 1
nv-cubb | 23:12:56.444517 WRN phy_init 0 [CTL.YAML] 2
nv-cubb | 23:12:56.444517 WRN phy_init 0 [CTL.YAML] 3
nv-cubb | 23:12:56.444517 WRN phy_init 0 [CTL.YAML] Flow list PDCCH:
nv-cubb | 23:12:56.444517 WRN phy_init 0 [CTL.YAML] 0
nv-cubb | 23:12:56.444517 WRN phy_init 0 [CTL.YAML] 1
nv-cubb | 23:12:56.444517 WRN phy_init 0 [CTL.YAML] 2
nv-cubb | 23:12:56.444517 WRN phy_init 0 [CTL.YAML] 3
nv-cubb | 23:12:56.444517 WRN phy_init 0 [CTL.YAML] Flow list PDSCH:
nv-cubb | 23:12:56.444518 WRN phy_init 0 [CTL.YAML] 0
nv-cubb | 23:12:56.444518 WRN phy_init 0 [CTL.YAML] 1
nv-cubb | 23:12:56.444518 WRN phy_init 0 [CTL.YAML] 2
nv-cubb | 23:12:56.444518 WRN phy_init 0 [CTL.YAML] 3
nv-cubb | 23:12:56.444518 WRN phy_init 0 [CTL.YAML] Flow list CSIRS:
nv-cubb | 23:12:56.444518 WRN phy_init 0 [CTL.YAML] 0
nv-cubb | 23:12:56.444518 WRN phy_init 0 [CTL.YAML] 1
nv-cubb | 23:12:56.444518 WRN phy_init 0 [CTL.YAML] 2
nv-cubb | 23:12:56.444518 WRN phy_init 0 [CTL.YAML] 3
nv-cubb | 23:12:56.444518 WRN phy_init 0 [CTL.YAML] Flow list PUSCH:
nv-cubb | 23:12:56.444519 WRN phy_init 0 [CTL.YAML] 0
nv-cubb | 23:12:56.444519 WRN phy_init 0 [CTL.YAML] 1
nv-cubb | 23:12:56.444519 WRN phy_init 0 [CTL.YAML] 2
nv-cubb | 23:12:56.444519 WRN phy_init 0 [CTL.YAML] 3
nv-cubb | 23:12:56.444519 WRN phy_init 0 [CTL.YAML] Flow list PUCCH:
nv-cubb | 23:12:56.444519 WRN phy_init 0 [CTL.YAML] 0
nv-cubb | 23:12:56.444519 WRN phy_init 0 [CTL.YAML] 1
nv-cubb | 23:12:56.444519 WRN phy_init 0 [CTL.YAML] 2
nv-cubb | 23:12:56.444519 WRN phy_init 0 [CTL.YAML] 3
nv-cubb | 23:12:56.444519 WRN phy_init 0 [CTL.YAML] Flow list SRS:
nv-cubb | 23:12:56.444519 WRN phy_init 0 [CTL.YAML] 8
nv-cubb | 23:12:56.444519 WRN phy_init 0 [CTL.YAML] 9
nv-cubb | 23:12:56.444519 WRN phy_init 0 [CTL.YAML] 10
nv-cubb | 23:12:56.444520 WRN phy_init 0 [CTL.YAML] 11
nv-cubb | 23:12:56.444520 WRN phy_init 0 [CTL.YAML] Flow list PRACH:
nv-cubb | 23:12:56.444520 WRN phy_init 0 [CTL.YAML] 4
nv-cubb | 23:12:56.444520 WRN phy_init 0 [CTL.YAML] 5
nv-cubb | 23:12:56.444520 WRN phy_init 0 [CTL.YAML] 6
nv-cubb | 23:12:56.444520 WRN phy_init 0 [CTL.YAML] 7
nv-cubb | 23:12:56.444520 WRN phy_init 0 [CTL.YAML] PUSCH TV: /opt/nvidia/cuBB/testVectors/cuPhyChEstCoeffs.h5
nv-cubb | 23:12:56.444520 WRN phy_init 0 [CTL.YAML] SRS TV: /opt/nvidia/cuBB/testVectors/cuPhyChEstCoeffs.h5
nv-cubb | 23:12:56.444520 WRN phy_init 0 [CTL.YAML] Section_3 time offset: 58369
nv-cubb | 23:12:56.444520 WRN phy_init 0 [CTL.YAML] nMaxRxAnt: 4
nv-cubb | 23:12:56.444521 WRN phy_init 0 [CTL.YAML] PUSCH PRBs Stride: 273
nv-cubb | 23:12:56.444521 WRN phy_init 0 [CTL.YAML] PRACH PRBs Stride: 12
nv-cubb | 23:12:56.444521 WRN phy_init 0 [CTL.YAML] SRS PRBs Stride: 273
nv-cubb | 23:12:56.444521 WRN phy_init 0 [CTL.YAML] PUSCH nMaxPrb: 273
nv-cubb | 23:12:56.444521 WRN phy_init 0 [CTL.YAML] PUSCH nMaxRx: 4
nv-cubb | 23:12:56.444521 WRN phy_init 0 [CTL.YAML] UL Gain Calibration: 78.68
nv-cubb | 23:12:56.444521 WRN phy_init 0 [CTL.YAML] Lower guard bw: 845
nv-cubb | 23:12:56.457410 ERR phy_init 0 [AERIAL_INTERNAL_EVENT] [CUPHY.PTI] CUDA Runtime Error: /opt/nvidia/cuBB/cuPHY/src/cuphy/cuphy_pti.cpp:46:MPS client failed to connect to the MPS control daemon or the MPS server
nv-cubb | 23:12:56.457437 ERR phy_init 0 [AERIAL_SYSTEM_API_EVENT] [NVLOG.EXIT_HANDLER] FATAL exit: Thread [phy_init] on core 10 file /opt/nvidia/cuBB/cuPHY/src/cuphy/cuphy_pti.cpp line 46: additional info: CUDA Runtime Error: {}:{}:{}
nv-cubb | Stack trace (most recent call last):
nv-cubb | #7 Object "/usr/lib/aarch64-linux-gnu/ld-linux-aarch64.so.1", at 0xffffffffffffffff, in
nv-cubb | #6 Object "/opt/nvidia/cuBB/build/cuPHY-CP/cuphycontroller/examples/cuphycontroller_scf", at 0x41276f, in _start
nv-cubb | #5 Object "/usr/lib/aarch64-linux-gnu/libc.so.6", at 0xeab9490474cb, in __libc_start_main
nv-cubb | #4 Object "/usr/lib/aarch64-linux-gnu/libc.so.6", at 0xeab9490473fb, in
nv-cubb | #3 Object "/opt/nvidia/cuBB/build/cuPHY-CP/cuphycontroller/examples/cuphycontroller_scf", at 0x40f873, in main
nv-cubb | #2 Object "/opt/nvidia/cuBB/build/cuPHY/src/cuphy/libcuphy.so", at 0xeab95a4612ab, in cuphy_pti_init
nv-cubb | #1 Object "/opt/nvidia/cuBB/build/cuPHY/nvlog/libnvlog.so", at 0xeab9494acabb, in exit_handler::test_trigger_exit(char const*, int, char const*)
nv-cubb | #0 Source "/opt/nvidia/cuBB/cuPHY-CP/cuphydriver/src/common/cuphydriver_api.cpp", line 2773, in l1_exit_handler
nv-cubb | 2770: //PhyDriver initialization failure
nv-cubb | 2771: if(l1_getPhydriverHandle() == nullptr)
nv-cubb | 2772: {
nv-cubb | >2773: AERIAL_PRINT_BACKTRACE(32ULL);
nv-cubb | 2774: exit(EXIT_FAILURE); //Exit immediately
nv-cubb | 2775: }
nv-cubb | 23:12:56.557507 WRN phy_init 0 [DRV.API] Trigging L1 exit handler
nv-cubb | [C]: Usage: ./build/cuPHY-CP/gt_common_libs/nvIPC/tests/pcap/pcap_collect <name> <destination path>
nv-cubb |
nv-cubb | [C]: Current run: ./build/cuPHY-CP/gt_common_libs/nvIPC/tests/pcap/pcap_collect name=nvipc dest_path=/var/log/aerial
nv-cubb |
nv-cubb | [I]: shmlogger_collect: save /var/log/aerial/nvipc_pcap and /dev/shm/nvipc_pcap logs to /var/log/aerial/nvipc_pcap
nv-cubb | [E][AERIAL_SYSTEM_API_EVENT]: ipc_shm_open: shm_open nvipc_pcap failed error -1
nv-cubb | [E][AERIAL_NVIPC_API_EVENT]: nv_ipc_shm_open: primary=0 name=nvipc_pcap size=8388680 Failed
nv-cubb | [E][AERIAL_SYSTEM_API_EVENT]: ipc_shm_close: close shm_fd failed
nv-cubb | [E][AERIAL_NVIPC_API_EVENT]: shmlogger_open: nv_ipc_shm_open failed
nv-cubb | [I]: shmlogger_collect: no /dev/shm/nvipc_pcap, logger may have been closed normally
nv-cubb exited with code 0
Gracefully stopping... (press Ctrl+C again to force)
dependency failed to start: container nv-cubb exited (0)
There is still an issue initiating MPS service.
After starting MPS service, can you check if it is running?
ps -ef | grep nvidia-cuda-mps-control
Can you also check if the daemon is running?
echo get_server_list | nvidia-cuda-mps-control
When I ran the commands above, I got the following output:
aerial@mit-b32-gnb3:~$ ps -ef | grep nvidia-cuda-mps-control
root 170029 1 0 Apr04 ? 00:00:00 nvidia-cuda-mps-control -d
aerial 2334725 2332746 0 14:02 pts/2 00:00:00 grep --color=auto nvidia-cuda-mps-control
aerial@mit-b32-gnb3:~$ echo get_server_list | nvidia-cuda-mps-control
Cannot find MPS control daemon process
@subhams Can you share the script that you use to configure and start the cubb container?
Here is the script:
aerial@mit-b32-gnb3:~/openairinterface5g/ci-scripts/yaml_files/sa_gh_gnb$ cat docker-compose-gnb.yaml
services:
nv-cubb:
container_name: nv-cubb
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities:
- gpu
network_mode: host
shm_size: 4096m
privileged: true
stdin_open: true
tty: true
volumes:
- /lib/modules:/lib/modules
- /dev/hugepages:/dev/hugepages
- /usr/src:/usr/src
- ./aerial_l1_entrypoint.sh:/opt/nvidia/cuBB/aerial_l1_entrypoint.sh
- /var/log/aerial:/var/log/aerial
- ../../../cmake_targets/share:/opt/cuBB/share
userns_mode: host
ipc: "shareable"
image: cubb-build:24-3
environment:
- cuBB_SDK=/opt/nvidia/cuBB
command: bash -c "sudo rm -rf /tmp/phy.log && sudo chmod +x /opt/nvidia/cuBB/aerial_l1_entrypoint.sh && /opt/nvidia/cuBB/aerial_l1_entrypoint.sh"
healthcheck:
test: ["CMD-SHELL",'grep -q "L1 is ready!" /tmp/phy.log && echo 0 || echo 1']
interval: 20s
timeout: 5s
retries: 5
c_oai-gnb-aerial:
image: oai-gnb-aerial:latest
depends_on:
nv-cubb:
condition: service_healthy
privileged: true
ipc: "container:nv-cubb"
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities:
- gpu
network_mode: host
shm_size: 4096m
stdin_open: true
tty: true
volumes:
- /lib/modules:/lib/modules
- /dev/hugepages:/dev/hugepages
- /usr/src:/usr/src
- ~/share:/opt/nvidia/cuBB/share
- /var/log/aerial:/var/log/aerial
# Use this for CBRS radios
#- ../../../targets/PROJECTS/GENERIC-NR-5GC/CONF/gnb-vnf.sa.cbrs.aerial.conf:/opt/oai-gnb/etc/gnb.conf
- ../../../targets/PROJECTS/GENERIC-NR-5GC/CONF/gnb-vnf.sa.band78.273prb.aerial.conf:/opt/oai-gnb/etc/gnb.conf
container_name: c_oai-gnb-aerial
command: bash -c "chrt -r 1 taskset -c 11-16 chrt -f 95 /opt/oai-gnb/bin/nr-softmodem -O /opt/oai-gnb/etc/gnb.conf | tee /var/log/aerial/oai.log"
#cpuset: 11-18
healthcheck:
test: /bin/bash -c "ps aux | grep -v grep | grep -c softmodem"
interval: 10s
timeout: 5s
retries: 5
@subhams, did you make any changes to the aerial_l1_entrypoint.sh script here ?
Which version of ARC are you using?
Thank you.
We had updated the entrypoint script with new interface ID and mac ID of the RU.
We are using 24-3.
Thank you.
jixu
10
@subhams
can you capture the outputs of the following commands:?
nvidia-smi
lsmod | grep -i nvidia
ps -ef | grep -i mps
Here are the outputs:
aerial@mit-b32-gnb3:~$ nvidia-smi
Mon Apr 7 18:23:43 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.42.02 Driver Version: 555.42.02 CUDA Version: 12.5 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GH200 480GB On | 00000009:01:00.0 Off | 0 |
| N/A 33C P0 131W / 900W | 113MiB / 97871MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 170458 C nvidia-cuda-mps-server 104MiB |
+-----------------------------------------------------------------------------------------+
aerial@mit-b32-gnb3:~$ lsmod | grep -i nvidia
nvidia_uvm 4784128 2
nvidia_drm 262144 0
nvidia_modeset 1835008 1 nvidia_drm
nvidia 9043968 74 nvidia_uvm,gdrdrv,nvidia_modeset
video 262144 1 nvidia_modeset
ecc 196608 1 nvidia
drm_kms_helper 327680 4 ast,nvidia_drm
drm 983040 6 drm_kms_helper,ast,drm_shmem_helper,nvidia,nvidia_drm
aerial@mit-b32-gnb3:~$ ps -ef | grep -i mps
root 170029 1 0 Apr04 ? 00:00:00 nvidia-cuda-mps-control -d
root 170458 170029 0 Apr04 ? 00:00:27 nvidia-cuda-mps-server
aerial 1509018 1509000 0 18:25 pts/2 00:00:00 grep --color=auto -i mps
jixu
12
@subhams
would you please upgrade GPU driver to 560.35.03 and recheck again?
- following the steps below to unload the current driver modules
Unload the current driver modules
$ for m in $(lsmod | awk “/*(nvidia|nv_|gdrdrv)/ {print $1}”); do echo Unload $m…; sudo rmmod $m; done
Remove the driver if it was installed by runfile installer before.
$ sudo /usr/bin/nvidia-uninstall
2) purge the
sudo apt-get --purge remove “cublas” “cufft” “curand” “cusolver” “cusparse” “npp” “nvjpeg” “cuda*” “nsight*” “nvidia”
sudo apt-get autoremove
3) power-cycle the server
4) following the steps here to install the gpu-driver 560-35.03
https://docs.nvidia.com/aerial/cuda-accelerated-ran/aerial_cubb/cubb_install/installing_tools_gh.html#install-cuda-driver
system
Closed
13
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.