Environment
- Software: Aerial-CUDA-Accelerated-RAN (25.03)
- Module: cuMAC (using generate_tv.py for cell/MAC scheduling HDF5 generation)
- Hardware Setup: DU (cuPHY + testMAC) – RU Emulator
- Status: Successfully validated the full 4T4R flow. Now attempting to scale to 64T64R.
Symptoms
While the 4T4R configuration works flawlessly, transitioning to 64T64R by setting --ant 64 and updating nBsAntConst = 64 in examples/parameters.h leads to the following critical failures:
Subband SINR Kernel Launch Failure (blockDim=0):
In multiCellSinrCal.cu, the launch configuration calculation leads to an invalid value for Massive MIMO:
// multiCellSinrCal.cu
nMaxUeSinrCalPerRnd = floor(1024 / (64 * 64)); // Result: 0
numThrdPerBlk = 4096 * 0; // Result: 0
This causes cuLaunchKernel to return CUDA_ERROR_INVALID_VALUE. The current implementation’s assumption (one UE per nBsAnt^2 threads within a 1024-thread limit) is architecturally incompatible with 64T64R.
Hard-coded CPU LUTs:
The MCS/Layer selection logic in the CPU path appears to be strictly tuned for 4T4R, throwing exceptions when 64 antennas are defined.
What we have checked
- Confirmed that 4T4R Test Vectors (TV) generated by generate_tv.py pass the RU-cuPHY-testMAC integration test.
- Identified that the current multiCellSinrCal kernel relies on Shared Memory and thread synchronization that cannot scale beyond 1024 threads, making 64x64 matrix inversion impossible in the current block structure.
- Noticed that NVIDIA documentation mentions successful 64T64R testMAC tests, which implies a functional TV generation path exists.
Questions / Requests
Official 64T64R TV Generation Path: Does NVIDIA internally use the public cuMAC/generate_tv.py infrastructure to generate 64T64R H5 files? If so, is there a specific branch or a “Massive MIMO” parameter set that resolves the kernel launch and overflow issues?
Massive MIMO Reference Implementation: Is there a plan to update multiCellSinrCal to support 64x64 matrices within the Aerial-CUDA-Accelerated-RAN repository?
Recommended Workaround: For users needing 64T64R TV today, what is the supported procedure? Should we:
- Manually bypass Subband SINR calculation and use Wideband values?
- Use a different toolchain or script not included in the current public release?
- Modify the example code to utilize a multi-block or library-based inversion approach (e.g., cuBLAS/cuSOLVER)?
We are looking for guidance on how to properly generate 64T64R test vectors to continue our RU-DU integration. Any example configurations or specific documentation regarding Massive MIMO TV generation would be highly appreciated.