-
1 BSP environment:
TX2 jetpack 4.6 L4T R32.6.1 kernel 4.9 aarch64
TX2 (p3310) -
2 Background:
after we flashing a brand new tx2, the device bring up and requiring us do the nvpmodel config. -
3 Problem:
3.1 what’s the difference between MAXN, MAXQ, MAXP_COREALL and MAXP_CORE_ARM. And what is their influence. which one should we select, based on our project.
3.2 how to config it in runtime using command line interface?
3.3 we expect record the nvpmodel config somewhere, so that when we flash a new device, we can skip the nvpmodel configuration in desktop environment.
Please refer to NVIDIA Jetson Linux Driver Package Software Features : Clock Frequency and Power Management | NVIDIA Docs
Hi
if we choose the default setting (please refer to the following picture), tty log shows CPU1 and CUP2 shutdown.
[ 1068.366532] CPU1: shutdown
[ 1068.438298] CPU2: shutdown
since the tx2 has quad cortex a57 and dual denver. So, which one is CPU1, or CPU2? and why? does the system use cortex a57 by default? is there difference between the process scheduling on cortex a57 and denver? or the denver is banded to GPU i guess.
on the nvpmodel default choosen condition, will the cpu1 and cpu2 be shutdown forever? if not, when will they be waked up?
FYI
See:
Hi
this doc says
Once you set a power mode, the module stays in that mode until you change it. The mode persists across power cycles and SC7.
does it mean that at MAXP_CORE_ARM default model or model 3, cpu1 and cpu2 wiill be shutdown forever? If so, why the “htop” still shows 6 cops here? in other words, how to prove that all cpu are working, is there some commands?
and what is SC7?
and according to the above doc, MAXN or model 0 has 6 cores on line. but what is MAXN’s watts? We do not see in the docs. What’s more, is there any side effects if we choose model 0? what should we test and notice to make sure our device stable?
hello Henry.Lou,
we disable Denver cores by default due to the Denver 2 CPU cores have different performance characteristics than the ARM Cortex-A57 cores.
it’s CPU-1, CPU-2 as Denver cores. you may also check tegrastats, they should be 0% usage even running CPU stress.
please refer to Release Notes (r32.5.1) for [5.15 Increased Kernel Launch Latency on Denver 2 Cores] as see-also.
you needs to remove isolcpus=1-2
in kernel command line to enable Denver cores.
for exmaple,
modify the configuration file, p2771-0000.conf.common
as below and re-flash the TX2 platform.
CMDLINE_ADD=“console=ttyS0,115200n8 console=tty0 fbcon=map:0 net.ifnames=0 isolcpus=1-2”;
it means deep sleep.
please see-also developer guide, Chipset Power States for reference.
MaxN configure system to performance mode, it doesn’t care about power consumption.
Hi
we use sudo nvpmodel -m 0 to change nv power model.
and use tegrastats --interval 1000
but, the cpu3 stays at 0. why? we expect all cores should work.
RAM 1578/7859MB (lfb 1211x4MB) SWAP 0/3930MB (cached 0MB) CPU [26%@1266,100%@2035,0%@2034,22%@1267,20%@1268,19%@1266] EMC_FREQ 2%@1866 GR3D_FREQ 0%@114 APE 150 MTS fg 0% bg 0% PLL@26.5C MCPU@26.5C PMIC@50C Tboard@22C GPU@24C BCPU@26.5C thermal@25.5C Tdiode@22.75C VDD_SYS_GPU 59/59 VDD_SYS_SOC 779/779 VDD_4V0_WIFI 0/15 VDD_IN 5544/5570 VDD_SYS_CPU 1798/1811 VDD_SYS_DDR 1296/1292
hello Henry.Lou,
please execute tegrastatus
for confirmation. thanks
Hi
this is the out come of the tegrastatus
command.
hello Henry.Lou,
please also have a try for running stress test to fully occupied the resource. for example, $ stress --cpu 6
Hi
we run stress --cpu 6 &
and then
run tegrastatus --interval 1000
out put:
CPU [100%@2020,0%@345,0%@345,100%@2019,100%@2016,100%@2018]
is seems that cpu2 and cpu3 do not work. why?
hello Henry.Lou,
had you really enable Denver cores?
please dump the /proc/cmdline and /boot/extlinux/extlinux.conf for confirmation.
Hi
we use nvpmodel -m 0
to enable 6 cores. is it right?
root@ubuntu-desktop:/home/ubuntu# cat /proc/cmdline
console=ttyS0,115200 androidboot.presilicon=true firmware_class.path=/etc/firmware root=/dev/mmcblk0p1 rw rootwait rootfstype=ext4 console=ttyS0,115200n8 conso
root@ubuntu-desktop:/home/ubuntu# cat /boot/extlinux/extlinux.conf
TIMEOUT 30
DEFAULT primary
MENU TITLE L4T boot options
LABEL primary
MENU LABEL primary kernel
LINUX /boot/Image
INITRD /boot/initrd
APPEND ${cbootargs} quiet root=/dev/mmcblk0p1 rw rootwait rootfstype=ext4 console=ttyS0,115200n8 console=tty0 fbcon=map:0 net.ifnames=0 isolcpus=1-2
# When testing a custom kernel, it is recommended that you create a backup of
# the original kernel and add a new entry to this file so that the device can
# fallback to the original kernel. To do this:
#
# 1, Make a backup of the original kernel
# sudo cp /boot/Image /boot/Image.backup
#
# 2, Copy your custom kernel into /boot/Image
#
# 3, Uncomment below menu setting lines for the original kernel
#
# 4, Reboot
# LABEL backup
# MENU LABEL backup kernel
# LINUX /boot/Image.backup
# INITRD /boot/initrd
# APPEND ${cbootargs}
hello Henry.Lou,
as I mentioned in previous comment #7. you needs to remove isolcpus=1-2
in kernel command line to enable Denver cores.
since you still have them included in your APPEND
, that’s why you still seeing 0% for denver cores.
Hi
now we namually edite extlinux.conf and reboot device. but same response.
the following is out put after we reboot device.
root@ubuntu-desktop:/home/ubuntu# cat /boot/extlinux/extlinux.conf
TIMEOUT 30
DEFAULT primary
MENU TITLE L4T boot options
LABEL primary
MENU LABEL primary kernel
LINUX /boot/Image
INITRD /boot/initrd
APPEND ${cbootargs} quiet root=/dev/mmcblk0p1 rw rootwait rootfstype=ext4 console=ttyS0,115200n8 console=tty0 fbcon=map:0 net.ifnames=0
# When testing a custom kernel, it is recommended that you create a backup of
# the original kernel and add a new entry to this file so that the device can
# fallback to the original kernel. To do this:
#
# 1, Make a backup of the original kernel
# sudo cp /boot/Image /boot/Image.backup
#
# 2, Copy your custom kernel into /boot/Image
#
# 3, Uncomment below menu setting lines for the original kernel
#
# 4, Reboot
# LABEL backup
# MENU LABEL backup kernel
# LINUX /boot/Image.backup
# INITRD /boot/initrd
# APPEND ${cbootargs}
hello Henry.Lou,
could you please modify the configuration file, p2771-0000.conf.common
as below and re-flash the TX2 platform.
for example, CMDLINE_ADD=“console=ttyS0,115200n8 console=tty0 fbcon=map:0 net.ifnames=0 isolcpus=1-2;"
There is more than one place kernel arguments can be added. extlinux.conf
is just one of them. To verify the argument is actually removed, see if it occurs in the output from this command:
cat /proc/cmdline
Hi @linuxdev
in our project, GPU is needed to compute the point cloud data. should we disable the denver cores and reserve them? If yes, how to do it? and how to reserve denver codes to specified process?
last but not least.
every time after we flashing a brand new tx2, we have to do nvpmodel mode config. how to save it in the source code or SDK, so that we do not need to do this configuration anymore.
In general, mostly you don’t want them reserved. Unless you have a purpose for doing so, then I’d remove the “isolcpus=1-2
”. However, generally speaking, this won’t stop anything from working either way, it just changes performance.
It isn’t your original question, but it might be worth explaining some background related to all this. This goes beyond a simple answer, but instead of just saying to use or not use isolcpus
you might find this interesting (ignore this if you don’t need to know why cores might be isolated).
First, the Denver cores do much the same as the other cores, but they have different characteristics. I think one reason they were added on the early models is because of the energy efficiency and flexibility. But overall, they are not that different from any other core.
So the question becomes one of which core to use and when. This takes you down the proverbial rabbit hole of schedulers and just what multitasking is. Any time you have more threads or processes than you have compute resources something has to wait while something else runs. Determining what runs and when is something end users rarely think about, but the scheduler does this constantly based on some rule.
A typical rule is that every process (I’ll not mention threads again, but threads are mostly interchangeable in the below when I say process) is given a time slice. It runs for that time, and then another process runs while the original is safely stored away wherever it left off. It takes time and memory to stop and store a process, and so it is best to not wildly change processes before significant work has been completed. Also, some parts of a process might need to be atomic, and interrupting them to store them is not practical; an example might be a process retrieving data from a hard drive…the hard drive is not going to stop and switch to reading some other sector and go back without error since it only reads in blocks. The scheduler is aware of atomic sections as well as having some rule for batching certain things together or in a row for efficiency. An example of batching is that cache RAM might be using the same memory among multiple threads of a single process gaining benefit by not invalidating the cache when moving to an entirely different process unrelated to those threads. The scheduler modifies schedules of execution based on concepts such as those.
The Denver cores might have increased latency in some respects, and yet they have benefits in other ways (such as low power consumption in low power models) for a given job. Disabling them for general computing and avoiding those cores can lower latency. However, if you were to run some particular process on those cores, you might find better power consumption and the same average performance despite the lower latency. Thus it could be of interest to use those cores specifically when you don’t care about increased latency, but do want decreased power. Plus they are extra cores, so you could do more work overall. But you might not use such a core for something like handling your mouse as a gamer.
Jetsons tend to be used for CUDA or other GPU-related specialty software. An RPi is fine with so many things, but often a Jetson user might be interested in putting their specialty software in the Denver cores while using the others for “regular” use. The scheduler does not have anything set up to be aware of this, but if you isolate those cores from the scheduler (that is what “isolcpus
” does as a kernel argument…it configures an aspect of the scheduler so far as what it does in general), then those cores are reserved only for processes you’ve marked as having a CPU affinity for those cores. You can of course disable the isolcpus
and just tell the scheduler to use the cores as if they are like any other core. If you don’t have any special performance objective, then I suggest just enabling the cores as it takes special setup of processes to use the isolated cores. Then you don’t need to do anything and the cores are just used without effort. If you are benchmarking though, consider that improvements in either latency or power consumption might be achieved when isolating cores and assigning CPU affinity to particular processes.
The following is even further from what your question is more detail on latency and classifications of schedulers. Before that I’ll state that in any system competition of threads or processes for compute resources might be termed “how nice the scheduler is to one process versus another process”. The nicer a process schedule is, the less that process gets when the competition needs time. One can be “less nice” to become king of the hill. Just be careful that something critical (e.g., disk drivers) doesn’t get a lower priority (disk drivers shouldn’t be too nice) than something that is atomic and won’t complete without disk data. It starts to remind me of time travel science fiction movies where there are unintended consequences when something in the past conflicting with something in the future creating a paradox (in this case, a priority inversion might occur).
There is ordinary scheduling, a plain vanilla scheduler which is what you are used to on the Linux desktop PC and what most of the Jetson is working with for ordinary software. Then there are soft realtime extensions. Many processes are marked as a default class which is ordinary (it is a “class” a process is marked with, and a policy enforced for that class for the scheduler). If you have soft realtime extensions, then processes can be marked for priority, and although ordinary fairness of competition between processes occurs, those of the realtime class are given some minimal time slice no matter how unfair it is to the other processes. The soft realtime schedule you’ll see a lot on a PC is the audio playback. Audio really must get a certain time slice in order to prevent stuttering. Simple batch play of audio is unacceptable.
However, what would happen if some critical atomic section of an important driver were preempted for audio? You might find the system crashing, or data corruption, so on. This is why that particular realtime is “soft” realtime. It isn’t guaranteed. In order to get hard realtime (and ARM actually has several subclassifications of hard realtime), where priorities are absolutely guaranteed for given time slices, you need hardware assistance. You probably need hardware support and smart programming to deal with some atomic code sections and scheduling algorithms (resources requirements for realtime scheduling goes up exponentially with the number of threads and the ARM Cortex-R series CPUs tend to have actual scheduling hardware which doesn’t exist on Cortex-A and Cortex-M…I really like Cortex-R). Also, although cache RAM improves average performance, the hit or miss nature means it is bad for hard realtime because hard realtime is really about enforced absolute latency, and not about average throughput. Hard realtime means lower performance, but an absolute guarantee that you get what you want without failure, ever.
The RT kernel extensions you can get with the ARM Cortex-A (which is what most of a Jetson is, although it does have some Cortex-R you can’t normally reach) are soft realtime. The presence of the Denver cores can be used to tune average performance to higher levels without consuming as much power as regular cores use, but latency might be worse. Whether that latency matters to you or not depends on your use, but for most users they won’t be aware of that latency. Someone performing benchmarks will notice. If latency becomes an issue under load, then this is when you consider working with CPU affinity and keeping the Denver cores isolated, but scheduled for specific classes of processes (which you’d have to manually set up).
Some URLs which might be of interest:
- https://forums.developer.nvidia.com/t/threads-running-on-an-asymmetric-system-parker-soc-has-2-denver-cores-and-four-arm-cortex-a57/126242/5 (see the
taskset
content, this sets a process to a core). - https://forums.developer.nvidia.com/t/irq-balancing/126244/6
- https://www.kernel.org/doc/Documentation/IRQ-affinity.txt (IRQ affinity docs from kernel.org, setting affinity does not require RT extensions, but if a process is the only one on a core, and there is no competition, then you have the ultimate realtime if you ignore cache hit/miss issues).
- https://forums.developer.nvidia.com/t/cpu-at-0-performance/177769/3 (a bit more on
taskset
).
Just to reiterate, isolating a core and then setting a process to that core is one step for tuning performance. This in itself does not have a realtime effect other than because no other process is competing (but that’s a rather large performance advantage, especially since cache will never be invalidated by another process). A process that needs a lot of average performance without consuming as much power, and tolerant of some latency increase, would be a good candidate for Denver core affinity (and in many cases you wouldn’t care and could just schedule Denver cores like any other). Realtime itself implies setting classes to processes and informing the scheduler to use non-default priority algorithms.
Note that isolating cores does not disable cores, but it does tell the scheduler not to send anything there that is not marked for there.
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.