However installing this on the Jetson Thor and enabling the profile (and rebooting) does not seem to give the expected results. Running a network with trtexec gives roughly the same inference time as the MAXN profile and jtop reports approximately 110W power-usage.
The profile seems to correctly want to set:
GPU_POWER_GATING GPU_PG_MASK 15873
GPU_POWER_CONTROL_ENABLE GPU_PWR_CNTL_EN on
after reboot we would expect this to be set here:
GPU_PG_MASK /sys/bus/pci/devices/0000:01:00.0/gpu_pg_mask
but no gpu_pg_mask exists.
Is there something else that needs to be set in order to enable the change of TPCs being used?
Hey Lee, I ran into a similar issue before turns out the gpu_pg_mask file isn’t always exposed depending on the Jetson firmware or driver version. You might need to update your JetPack or reapply the nvpmodel.conf under root permissions. Also, double-check that your custom 70W profile is properly registered in /etc/nvpmodel.conf and re-select it using Nebroo sudo nvpmodel -m. Sometimes a full cold reboot (not just restart) helps the new power settings take effect.
Download nvpmodel configuration file
Configure the target system with the downloaded nvpmodel.conf file.
● Copy the downloaded nvpmodel.conf file to the target platform.
● Set nvpmodel mode to 0:
# nvpmodel -m 0
● Stop the nvpmodel service:
# systemctl stop nvpmodel
● Apply the nvpmodel.conf file
cp /etc/nvpmodel.conf ~/
cp <downloaded_nvpmodel_conf_path>/nvpmodel.conf /etc/nvpmodel.conf
Note: Do not overwrite the nvpmodel.conf file before nvpmodel service is stopped.
Click the Logout button to log out from the current session.
Click the Reset button to reset all power knob settings to the default state for the power mode selected from the dropdown list.
We have tried several cold-boots and reboots. Changing the nvpmodel with sudo and copying the nvpmodel.conf while stopping the service. All changes in the config regarding number of active cpu cores and cpu+gpu clock frequencies are adjusted correctly when changing them, it is only the GPU_POWER_GATING that does not seem to work.
I guess the error is related to this: NVPM VERB: PARAM GPU_POWER_GATING: ARG GPU_PG_MASK: PATH /dev/null: REAL_VAL: on CONF_VAL: 15873
// When GPU is suspended(railgated), the PM runtime suspend callback should
// suspend all devfreq devices, and devfreq cycle should not be triggered.
//
// However, users are still able to change the devfreq governor from the
// sysfs interface and indirectly invoke the update_devfreq function, which
// will further call the target callback function.
//
// Early stop the process here before clk_set_rate/clk_get_rate, since these
// calls served by BPMP will awake the GPU.
echo "nvhost_podgov" |sudo tee /sys/devices/platform/bus@0/d0b0000000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0/gpu-gpc-0/devfreq/gpu-gpc-0/governor
echo "performance" |sudo tee /sys/devices/platform/bus@0/d0b0000000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0/gpu-gpc-0/devfreq/gpu-gpc-0/governor
Or install jetson_stats / jtop from source to get the recently updated version for Thor; that has functional changeable Railgate and 3d scaling on 2GPU page.
I have tried setting the governor and also tried changing the power control from auto to on.
I also tried installing newest jetson-stats from source to enable and disable railgating and 3d scaling.
Unfortunately this does not seem to do anything.
The verbose output from nvpmodel still shows:
NVPM VERB: Don’t refer to GPU_POWER_GATING sysfs when using nvidia.ko
NVPM VERB: PARAM GPU_POWER_GATING: ARG GPU_PG_MASK: PATH /dev/null: REAL_VAL: 0 CONF_VAL: 15873
NVPM VERB: PARAM GPU_POWER_CONTROL_ENABLE: ARG GPU_PWR_CNTL_EN: PATH /sys/devices/platform/bus@0/d0b0000000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0/power/control: REAL_VAL: on CONF_VAL: on
which seems to indicate that the REAL_VAL for GPU_PG_MASK is never set and it seems that the system is unable to set it since it is pointing to /dev/null.
All other settings work just fine. If i set the max frequency for the gpu we do see a drop in wattage. We do however want to disable part of the GPU to simulate the T4000 for internal benchmarks.
I have now tried a clean install of 38.2.1, downloaded and replaced the nvpmodel.conf and it is the exact same result. Same output from nvpmodel -q as before.
Has anyone else succeded in setting the number of TPCs in use?
I’ve checked with internal that this is not an error, it’s expected.
We used to use nvgpu as kernel driver, which exposes The GPU_POWER_GATING sysfs
However, thor platforms with r38.x have been using nvidia.ko as kernel driver, which so far does not expose tpc sysfs.
Alternatively, you can configure them through nvpmodel.conf directly.
e.g.
Unfortunately setting this parameter directly, we are still able to push the Thor to use upwards 110W and getting the over-current warning. Not really sure how to check why the TPCs are not disabled.
I think we will release the 70W power mode configuration for Thor in the next JP7.1 release.
Please also note that 70W is a reference power mode, estimated for the typical workload, not for heavy stress workload.
So, if stress test is performed, then chances are still there for power exceeding the 70W.