Issues generating 64T64R testMAC vectors via cuMAC (thread-block limit & 32-bit integer overflow)

Environment

  • Software: Aerial-CUDA-Accelerated-RAN (25.03)
  • Module: cuMAC (using generate_tv.py for cell/MAC scheduling HDF5 generation)
  • Hardware Setup: DU (cuPHY + testMAC) – RU Emulator
  • Status: Successfully validated the full 4T4R flow. Now attempting to scale to 64T64R.

Symptoms

While the 4T4R configuration works flawlessly, transitioning to 64T64R by setting --ant 64 and updating nBsAntConst = 64 in examples/parameters.h leads to the following critical failures:
Subband SINR Kernel Launch Failure (blockDim=0):
In multiCellSinrCal.cu, the launch configuration calculation leads to an invalid value for Massive MIMO:

// multiCellSinrCal.cu
nMaxUeSinrCalPerRnd = floor(1024 / (64 * 64)); // Result: 0
numThrdPerBlk = 4096 * 0; // Result: 0

This causes cuLaunchKernel to return CUDA_ERROR_INVALID_VALUE. The current implementation’s assumption (one UE per nBsAnt^2 threads within a 1024-thread limit) is architecturally incompatible with 64T64R.
Hard-coded CPU LUTs:
The MCS/Layer selection logic in the CPU path appears to be strictly tuned for 4T4R, throwing exceptions when 64 antennas are defined.

What we have checked

  • Confirmed that 4T4R Test Vectors (TV) generated by generate_tv.py pass the RU-cuPHY-testMAC integration test.
  • Identified that the current multiCellSinrCal kernel relies on Shared Memory and thread synchronization that cannot scale beyond 1024 threads, making 64x64 matrix inversion impossible in the current block structure.
  • Noticed that NVIDIA documentation mentions successful 64T64R testMAC tests, which implies a functional TV generation path exists.

Questions / Requests

Official 64T64R TV Generation Path: Does NVIDIA internally use the public cuMAC/generate_tv.py infrastructure to generate 64T64R H5 files? If so, is there a specific branch or a “Massive MIMO” parameter set that resolves the kernel launch and overflow issues?
Massive MIMO Reference Implementation: Is there a plan to update multiCellSinrCal to support 64x64 matrices within the Aerial-CUDA-Accelerated-RAN repository?
Recommended Workaround: For users needing 64T64R TV today, what is the supported procedure? Should we:

  • Manually bypass Subband SINR calculation and use Wideband values?
  • Use a different toolchain or script not included in the current public release?
  • Modify the example code to utilize a multi-block or library-based inversion approach (e.g., cuBLAS/cuSOLVER)?

We are looking for guidance on how to properly generate 64T64R test vectors to continue our RU-DU integration. Any example configurations or specific documentation regarding Massive MIMO TV generation would be highly appreciated.

Hi @jacob33 ,

Please see below:

  • Do not use generate_tv.py --ant 64/multiCellSchedulerUeSelection for 64T64R TV generation in the current public tree.
  • multiCellSinrCal is only used for 4T4R scheduler pipeline testing. It’s not needed for 64T64R scheduler testing. It will not be extended to 64T64R SINR calculation in short term.
  • For cuMAC 64TR standalone scheduler pipeline validation (UE sorting + UE pairing + MCS selection), use multiCellMuMimoScheduler -c cuMAC/examples/multiCellMuMimoScheduler/config.yaml or cuMAC/scripts/cumac_64tr_test.py.
  • CPU versions of MCS/Layer selection are currently not fully supported for 64T64R. We will extend them for 64T64R support in future ACAR releases.
  • The formal cuMAC 64TR MU-MIMO UE grouping implementation for L2 stack integration purpose is offered under cuMAC/src/muMimoUserPairing, with the interfacing/integration library header cuMAC/lib/cumac_muUeGrp.h
    • Official cuMAC-CP full support for 64TR MU-MIMO UE grouping will be offered in ACAR Release 26-2, and is currently not available in Release 26-1.
  • For cuMAC 64TR MU-MIMO UE grouping L2 stack integration validation and test vector generation, use cuMAC/examples/muMimoUeGrpL2Integration with ENABLE_TV_TEST_MODE=true and TV_SAVE_ALL_SLOTS=true

Let us know if you are still having issues.

Thank you.