CBUF size of NVDLA on Orin

Hi. I am a Ph.D student working on computer architecture, especially Deep Neural Network Accelerator design. We are now interested in the NVDLA accelerator open-sourced by NVIDIA. We are particularly interested in its total on-chip SRAM size to facilitate a fair comparison to NVDLA for one of our research projects.

I have known that the NVDLAs equipped on AGX Orin and AGX Xavier have an SRAM. The SRAM size data is published according to TensorRT developer guide.

However, I have also read from the open-sourced NVDLA Unit description that each NVDLA contains a convolution buffer (CBUF), which is another on-chip SRAM component. The NVDLA Unit Description says that the CBUF is 512KB.

Is the NVDLA equipped on AGX Orin and AGX Xavier both has 512KB CBUF? If not, could you kindly provide the CBUF size of the NVDLA accelerators on AGX Orin and AGX Xavier, respectively? If the data is not publicly available, could you please provide the data to us in private (e.g. email)? Thank you very much!


They are the same thing. DLA internal SRAM is called CBUF.
So you can find the CBUF limit for Xavier and Orin in the TensorRT guide you shared above.


Thanks for your reply. However, I still doubt that the CBUF and the (dedicated) SRAM are distinct hardware components.

According to NVDLA Primer, a large NVDLA implementation contains a dedicated SRAM other than the internal CBUF. The dedicated SRAM connects to NVDLA through a dedicated interface (second DBB in Figure 2 in NVDLA Primer).

Combined with certain descriptions in TensorRT developer guide, I feel that this document is describing the dedicated SRAM. For example, it says:

On Xavier, 4 MiB of SRAM is shared across multiple cores including the 2 DLA cores.

This seems to imply that the 4MiB SRAM is not an internal component of a DLA (i.e. not a CBUF).

So could you please investigate it further? Thank you very much for your invaluable assistance!


Sorry for the missing.

Indeed, there are two SRAMs on the DLA called UBuf and CBuf.
UBuf size can be found in the TensorRT document:

On Orin, each DLA core has 1 MiB of dedicated SRAM. On Xavier, 4 MiB of SRAM is shared across multiple cores including the 2 DLA cores.

But for CBuf, unfortunately, we don’t have public info can share.