Reducing size of "shared-dma-pool" reserved memory region

On Orin Nano 4GB the following is a reserved region by default, ostensibly for “nvgpu comptags”

	reserved-memory {
		linux,cma { /* Needed for nvgpu comptags */
			compatible = "shared-dma-pool";
			reusable;
			size = <0x0 0x10000000>; /* 256MB */
			alignment = <0x0 0x10000>;
			linux,cma-default;
			status = "okay";
		};
	};

Looking at the cbc_status file, it looks like only ~15 MB is being allocated for the comptag store. This makes sense for our system memory capacity based on my reading of the code.

$ cat /sys/kernel/debug/17000000.gpu/cbc_status
cbc.compbit_backing_size: 14876672
cbc.comptags_per_cacheline: 32
cbc.gobs_per_comptagline_per_slice: 16
cbc.max_comptag_lines: 58000
cbc.comp_tags.size: 57999
cbc.compbit_store.base_hw: 1365504
cbc.compbit_store.mem.aperture: 1
cbc.compbit_store.mem.size: 14942208
cbc.compbit_store.mem.aligned_size: 14942208
cbc.compbit_store.mem.gpu_va: 0
cbc.compbit_store.mem.skip_wmb: 0
cbc.compbit_store.mem.free_gpu_va: 0
cbc.compbit_store.mem.mem_flags: 0
cbc.compbit_store.mem.cpu_va: ffff0000cd600000
cbc.compbit_store.mem.pa: 14d600000

However, when reducing the shared-dma-pool reserved region size, I encounter the following error on boot:

[    6.211899] nvgpu: 17000000.gpu            ga10b_fb_cbc_configure:100  [ERR]  CBC marked unsafe by MMU, check cbc config
[    6.225275] nvgpu: 17000000.gpu   nvgpu_timeout_expired_msg_retry:118  [ERR]  No more retries @ tu104_cbc_ctrl+0x21c/0x440 [nvgpu] 
[    6.225282] nvgpu: 17000000.gpu                    tu104_cbc_ctrl:192  [ERR]  comp tag clear timeout

What am I missing here? Further, for compute only (no graphics) workloads is it reasonable to disable CONFIG_NVGPU_COMPRESSION entirely?

Even with this error on boot, GPU workloads still run and I don’t recall encountering any performance issues, though I should retest.

*** Please note that this reply is generated by LLM automatically ***

Based on the provided context, I’ll try to address your question.

The reserved region you’re referring to is likely the nvgpu comptags region, which is a reserved area in the system memory for the NVIDIA GPU to store its comptags (compute tags). The comptags are used by the GPU to manage its memory and are essential for GPU workloads.

The cbcstatus file indicates that only 15 MB is being allocated for the comptag store, which might seem insufficient for your system memory capacity. However, this allocation is likely based on the specific requirements of the Jetson Orin Nano 4GB module.

When you try to reduce the shareddmapool reserved region size, you encounter an error on boot. This error is likely due to the fact that the reduced size is not sufficient to accommodate the comptag store requirements.

Disabling the comptag store entirely might not be a viable option, even for compute-only workloads without graphics. The comptag store is a critical component for the GPU’s memory management, and disabling it might lead to undefined behavior, crashes, or performance issues.

Although you mention that GPU workloads still run despite the error on boot, it’s essential to retest and verify that there are no performance issues or other problems. Disabling the comptag store might lead to issues that are not immediately apparent.

To resolve the issue, you could try the following:

  1. Verify that the shareddmapool reserved region size is set to a value that is sufficient to accommodate the comptag store requirements. You can check the recommended size in the NVIDIA documentation or by consulting with NVIDIA support.
  2. If you’re using a custom kernel or device tree, ensure that the comptag store is properly configured and allocated.
  3. If you’re experiencing issues with the comptag store allocation, consider reaching out to NVIDIA support for further assistance.

In summary, while it might seem reasonable to disable the comptag store for compute-only workloads, it’s not recommended to do so without proper testing and verification. The comptag store is a critical component for the GPU’s memory management, and disabling it might lead to issues that are not immediately apparent.

*** Please give the thumbs up if you get answers you like. Or provide feedback to help us improve the answer. ***

Hi,
As stated in above comment, this customization is not recommended.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.