Error message from "multiCellSchedulerUeSelection -t 1"

Hi there,
Following the instruction of “Aerial CUDA-Accelerated RAN, Release 24-2.1”, when I ran this command
.∕aerial_sdk∕cuMAC∕build∕examples∕multiCellSchedulerUeSelection∕
,→multiCellSchedulerUeSelection -t 1
I got following errors at the end of simulation:
CPU and GPU scheduler per-UE throughput performance check result: FAIL
Largest gap (in percentage) between CPU and GPU per-UE throughput CDFs = 3.850001%
CPU and GPU scheduler sum throughput performance check result: FAIL
Largest gap (in percentage) between CPU and GPU sum throughput curves = 9.520144%
CPU and GPU scheduler performance check result: FAIL
Is this failure expected? Or something is wrong in my cuMAC installation?
Thanks
Sheng

Hi @sheng.li ,

We have tested with Rel 24-2.1 and do not see the same issue. There might be an issue with your configuration.

Can you please share parameters.h file located under cuMAC/examples folder?

Thank you.

Hi @bkecicioglu ,

Thanks for your help, here is parameters.h file you asked, sorry, I don’t know how to upload files, so I just copy and paste it here:

/*

  • Copyright (c) 2021-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
  • NVIDIA CORPORATION and its licensors retain all intellectual property
  • and proprietary rights in and to this software, related documentation
  • and any modifications thereto. Any use, reproduction, disclosure or
  • distribution of this software and related documentation without an express
  • license agreement from NVIDIA CORPORATION is strictly prohibited.
    */

#pragma once

// debug parameters
// define CHANNEL_REUSE_
// define SCENARIO_1_
// define OUTPUT_SOLUTION_
// define LIMIT_NUM_SM_TIME_MEASURE_
// define CELLASSOCIATION_KERNEL_TIME_MEASURE_
// define MCSCHEDULER_DEBUG_
// define SCSCHEDULER_DEBUG_
// define CHANN_INPUT_DEBUG_
// define CELLASSOCIATION_PRINT_SAMPLE_
#if (defined(SCHEDULER_KERNEL_TIME_MEASURE_) || defined(CELLASSOCIATION_KERNEL_TIME_MEASURE_))
define numRunSchKnlTimeMsr 1000
endif

// GPU index
define gpuDeviceIdx 7 // index of GPU devive to use

// simulation duration
define numSimChnRlz 2000 //total number of simulated TTIs (e.g., 15000 for 1200 active UEs per cell; 5000 for 500 active UEs per cell)

// randomness
define seedConst 0 // randomness seed

// system parameters
//define mu 0 // OFDM numerology: 0, 1, 2, 3, 4
define slotDurationConst 0.5e-3 // 1.0e-3, 0.5e-3, 0.25e-3, 0.125e-3, 0.0625e-3
define scsConst 30000.0 // 15000.0, 30000.0, 60000.0, 120000.0, 240000.0 corresponding to OFDM numerology: 0, 1, 2, 3, 4
define numMcsLevels 28
define cellRadiusConst 1000
define numCellConst 20 // total number of cells in the network, including coordinated cells and interfering cells
define numCoorCellConst numCellConst // currently support max 21 coordinated cells
define numUePerCellConst 16 // number of UEs scheduled per time slot per cell
define numActiveUePerCellConst 100 // 100, 500, 1200. should be <= 2048
define totNumUesConst numCellConstnumUePerCellConst // total number of scheduled UEs per TTI that are associated with the coordinated cells
define totNumActiveUesConst numCellConst
numActiveUePerCellConst // total number of active UEs associated with the coordinated cells
define nBsAntConst 4
define nUeAntConst 4 // assumption’s that nUeAntConst <= nBsAntConst; nUeAntConst is also equal to the maximum number of layers
define nPrbsPerGrpConst 4
define nPrbGrpsConst 68
define WConst 12.0scsConstnPrbsPerGrpConst
define totWConst WConstnPrbGrpsConst
define PtConst 79.4328 // Macrocell - 49.0 dBm (79.4328 W), Microcell - 23 dBm (0.1995 W)
define PtRbgConst PtConst/nPrbGrpsConst
define PtRbgAntConst PtRbgConst/nBsAntConst
define bandwidthRBConst 12
scsConst
define bandwidthRBGConst nPrbsPerGrpConstbandwidthRBConst
define noiseFigureConst 9 // dB
// For testing need to adjust noise variance based on channel gain
define sigmaSqrdDBmConst -174 + noiseFigureConst+ 10
log10(bandwidthRBGConst)
define sigmaSqrdConst pow(10.0, ((sigmaSqrdDBmConst - 30.0)/10.0))
define gpuAllocTypeConst 1 // 0 - non-consecutive type 0 allocate, 1 - consecutive type 1 allocate
define cpuAllocTypeConst 0 // 0 - non-consecutive type 0 allocate, 1 - consecutive type 1 allocate
define prdSchemeConst 1 // 0 - no precoding, 1 - SVD precoding
define rxSchemeConst 1 // 1 - MMSE-IRC
// maximum number of scheduled UEs per cell per time slot is 1024

// max dimentions
define maxNumCoorCellConst 21
define maxNumBsAntConst 16
define maxNumUeAntConst 16
define maxNumPrbGrpsConst 100

// buffer size
define estHfrSizeCOnst nPrbGrpsConsttotNumUesConstnumCoorCellConstnBsAntConstnUeAntConst

// PDSCH parameters
define pdschNrOfSymbols 12
define pdschNrOfDmrsSymb 1
define pdschNrOfDataSymb pdschNrOfSymbols-pdschNrOfDmrsSymb
define pdschNrOfLayers 1

// PF scheduling
define initAvgRateConst 1.0
define pfAvgRateUpdConst 0.001
define betaCoeffConst 1.0
define sinValThrConst 0.1
define prioWeightStepConst 100
// power scaling
define AFTER_SCALING_SIGMA_CONST 1.0 // noise std after scaling to improve precision
// 1.0 for 49.0 dBm BS Tx power

define cpuGpuPerfGapPerUeConst 0.005
define cpuGpuPerfGapSumRConst 0.01
// interference control
define toleranceConst 0.4

// SVD precoder parameters
define svdToleranceConst 1.e-7
define svdMaxSweeps 15

// Normalized channel coefficients for __half range
define amplifyCoeConst 1

// output file
define mcOutputFile “output.txt”
define mcOutputFileShort “output_short.txt”

define targetChanCoeRangeConst 0.1f * nPrbGrpsConst * totNumUesConst // target channel coefficients range for precision issue
define MinNoiseRangeConst 0.001f // minimum noise figure for stability issues

#ifdef CHANNEL_REUSE_
define numRandTTI 50
#else
define numRandTTI numSimChnRlz
endif

Hi @sheng.li,

#define gpuAllocTypeConst 1 // 0 - non-consecutive type 0 allocate, 1 - consecutive type 1 allocate
#define cpuAllocTypeConst 0 // 0 - non-consecutive type 0 allocate, 1 - consecutive type 1 allocate

The values of gpuAllocTypeConst and cpuAllocTypeConst should be both set to 0 or 1 to ensure CPU and GPU schedulers run the same algorithm. They cannot be of different values.

Thank you.

Hi @bkecicioglu ,

Thank you for your quick reply. I will give it a try.

Thanks.

Sheng

1 Like

Hi @bkecicioglu, a quick question, did you test Rel 24-1? Just found the parameter.h file I uploaded was from Rel 24-1.
thanks
Sheng

@sheng.li , please use the parameter.h file matching with the release version you are using. The two parameters we discussed here should be still set to the same value.

Thank you.

Hi @bkecicioglu, thanks for your quick reply, but the parameter.h we got from Rel 24-1 was different than Rel 24-2. Was it a bug fix in Rel 24-2?

@sheng.li , I can see the default settings for these two parameters are different between Rel 24-1 and 24-2. This was not a bug. We are sorry for the confusion.

The parameters should be set according to the test mode you chose. For example, in the test you attempted, it is for CPU reference check. This test mode requires all parameters between the CPU and GPU to be the same.

The usage of the test program is explained as follows:

void usage()
{
    printf("cuMAC DL/UL scheduler pipeline test with [Arguments]\n");
    printf("Arguments:\n");
    printf("  -d  [Indication for DL/UL: 0 - UL, 1 - DL (default 1)]\n");
    printf("  -b  [Indication for baseline CPU RR scheduler/CPU reference check: 0 - CPU reference check, 1 - baseline CPU RR scheduler (default 0)]\n");
    printf("  -p  [Indication for using FP16 PRG allocation kernel: 0 - FP32, 1 - FP16 (default 0)]\n");
    printf("  -t  [Indication for saving TV before return: 0 - not saving TV, 1 - save TV for GPU scheduler, 2 - save TV for CPU scheduler, 3 - save per-cell TVs for testMAC/cuMAC-CP (default 0)]\n");
    printf("  -f  [Indication for choosing fast fading: 0 - Rayleigh fading, 1 - GPU TDL CFR on Prg, 2 - GPU TDL CFR on Sc and Prg (default 0)]\n"); // currently only CFR on Prg is used in network class, so 2 is not recommended
    printf("Example 1 (call cuMAC DL scheduler pipeline with CPU reference check): './multiCellSchedulerUeSelection'\n");
    printf("Example 2 (call cuMAC UL scheduler pipeline with CPU reference check): './multiCellSchedulerUeSelection -d 0'\n");
    printf("Example 3 (call cuMAC DL scheduler pipeline with baseline CPU RR scheduler): './multiCellSchedulerUeSelection -b 1'\n");
    printf("Example 4 (call cuMAC DL scheduler pipeline using GPU TDL channel): './multiCellSchedulerUeSelection -f <1 or 2>'\n");
    printf("Example 5 (create cuMAC test vector for DL: './multiCellSchedulerUeSelection -t 1'\n");
    // <channel_file> = ~/mnt/cuMAC/100randTTI570Ues2ta2raUMa_xpol_2.5GHz.mat
}

As you can see the default input argument for -b is CPU reference check ( = 0 ).

Thanks.

Hi @bkecicioglu,

Thanks for the explanation, sorry for my late response. We just installed 24-2 cuBB.

After installation, start from fresh, I ran “multiCellSchedulerUeSelection -t 1”, this time, all performance checks passed, but I got UE/PRG/layer/MCS/ selections solutions do not match at the end as below


Failure: CPU and GPU UE selection solutions do not match
Failure: CPU and GPU PRG allocation solutions do not match
Failure: CPU and GPU layer selection solutions do not match
Failure: CPU and GPU MCS selection solutions do not match
Failure: CPU and GPU channels do not match
CPU scheduler sum cell throughput: 4.358e+09
GPU scheduler sum cell throughput: 4.364e+09
CPU and GPU scheduler per-UE throughput performance check result: PASS
Largest gap (in percentage) between CPU and GPU per-UE throughput CDFs = 0.500000%
CPU and GPU scheduler sum throughput performance check result: PASS
Largest gap (in percentage) between CPU and GPU sum throughput curves = 0.722681%
CPU and GPU scheduler performance check result: PASS

are those failures expected in 24-2 cuBB?

Thanks

Sheng

Hi @bkecicioglu,

It’s been a while since I asked this question. I noticed on page 442 of the manual “Aerial CUDA-Accelerated RAN Release 24-2.1”, right after these two test examples, there was a statement “The assumption is that the simulation duration is long enough so that the scheduler algorithm’s performance converges.”. Does it imply that only the performance check is valid, not the selections check? Just need the clarification from Nvidia to make sure our installation process is correct.

BTW, by default setting in these tests of GPU/CPU compare is true (baseline == 0).

Thanks.

Sheng

Hi @sheng.li ,

The mismatch between the CPU and the GPU is caused by the difference in precision. This has been fixed in the upcoming release.

Please ignore Failure: CPU and GPU UE selection solutions do not match errors on the 24-2 release.

The following result confirms the test was successful.

CPU and GPU scheduler per-UE throughput performance check result: PASS

Thank you.

Hi @bkecicioglu,

Thank you very much for the clarification. I will ignore the failures for now, and looking forward to the new release.

Thanks.

Sheng

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.