Regarding cuMAC/cuPHY real-time operation

Base station dictates the timing (frame/slot/symbol) and carrier frequencies, UEs are slave to this timing and frequency framework. How cuBB/cuPHY is designed to maintain the timing? Traffic in different cells are dynamic, and users’ traffic are dynamic, how cuBB/cuPHY is designed to allocate resources (GPU threads/storage)? Answers to these questions help estimate the workload to improve the cuBB/cuPHY to support real-product develoment.

  1. Is there a timing signal in cuBB/cuPHY that controls the pipeline processing? Where is the timing signal coming from? and how does each pipeline follow the timing signal, with its local clock count? how to handle the pipeline when the pipeline reaches its deadline? how to set the deadline, one for all, or one for each?
  2. Is there a priority assigned to pipeline or task (variant from slot to slot, and user to user) that cuBB/cuPHY uses to allocate resource and schedule / dispatch the pipelines/tasks, and monitor the timing and adjust task allocation if needed? especially complexity of a pipeline is data dependent.
  3. The number of users is variant, the processing complexity (e.g. number of RBs, number layers, QAM orders, etc.) of each user can be different, how cuBB/cuPHY is designed to allocate the resources (GPU threads and storage) to users / task (a big user may have multiple tasks)? Fixed or dynamic? If dynamic, what are the rules of the allocation?
  4. The traffics of different sectors / cells /carriers (in carrier aggregation) are dynamic, how cuBB/cuPHY is designed to allocate resources among cells? optimizing resource allocation among the sectors / cells /carriers or fixed allocation?

Hi @weizhong.chen,

Please see below and let me know if you have any other questions.

How cuBB/cuPHY is designed to maintain the timing?

The frame/slot timing is tracked with the help of PTP service on the CPU side. cuPHY driver (under cuPHY-CP) tracks slot timing and schedules the workload on the GPU. After launching different workloads on the GPU, it monitors completion of these tasks with cudaEvents framework.

1. Is there a timing signal in cuBB/cuPHY that controls the pipeline processing? Where is the timing signal coming from? and how does each pipeline follow the timing signal, with its local clock count? how to handle the pipeline when the pipeline reaches its deadline? how to set the deadline, one for all, or one for each?

The previous answer should address the first three questions. There is no deadline set for cuPHY pipelines. A small timing jitter could be expected but they typically do not exceed their designed execution time. If a pipeline is stuck or is taking a very long time, cuPHY-CP has a mechanism to allow the cuPHY pipelines to recover or re-start the application.

2. Is there a priority assigned to pipeline or task (variant from slot to slot, and user to user) that cuBB/cuPHY uses to allocate resource and schedule / dispatch the pipelines/tasks, and monitor the timing and adjust task allocation if needed? especially complexity of a pipeline is data dependent.

The main mechanism is to use MPS framework to assign different resources to pipelines. This is done with a static configuration in cuphycontroller yaml file by assigning maximum number of SMs for each PHY channel pipeline. The recommended values shared in our each release is based on extensive empirical performance evaluations. The SMs are still dynamically allocated during run-time depending on the workload, but the number of SMs can not exceed the values configured in the yaml file.

3. The number of users is variant, the processing complexity (e.g. number of RBs, number layers, QAM orders, etc.) of each user can be different, how cuBB/cuPHY is designed to allocate the resources (GPU threads and storage) to users / task (a big user may have multiple tasks)? Fixed or dynamic? If dynamic, what are the rules of the allocation?

It is mostly dynamic. cuPHY driver receives L1 configuration from L2 adapter for each slot and prepares cuPHY workloads for each channel. This is done in every slot. Therefore, the memory usage can fluctuate depending on the load. SM resources are managed by the MPS service dynamically as explained above.

4. The traffics of different sectors / cells /carriers (in carrier aggregation) are dynamic, how cuBB/cuPHY is designed to allocate resources among cells? optimizing resource allocation among the sectors / cells /carriers or fixed allocation?

It is a dynamic allocation (with the exception of static parameters). The resource allocation happens per slot basis. It fluctuates with the traffic demand on each cell.

Thank you.

Thanks, a lot, very helpful.

I have a further question regarding your answer to question 2

further question: how the static configuration (resource requirement) is estimated? Online calculated by cuPHY-controller/driver based on test vector and profiling results or pre-calculated for each test vector?

%%%%%%%%%%%%%
I am asking these questions since we are thinking to improve the cuPHY algorithms for better link performance and more efficient implementations for more system capacity (at least 2X), thus we are going to change and make new building blocks to the cuPHY, we are evaluating how big the surgery will be in cuPHY-CP and cuPHY controller.

Currently, we are trying to get your current reference system running, then we can play with it so that we learn more about the system, thus allow us to evaluate the complexity to put the new set of puzzle together.

Not sure if Nvidia is interested in our work (better cuPHY algorithm and better efficient implementation), if yes, Nvidia may join us, thus make it happen quicker. The end result will certainly make the cuBB more attractive to whoever.

Weizhong Chen