NVIDIA Jetson Xavier CPU core maxing out issue

I am trying to run Isaac navigation and Issac cartographer on the NVIDIA Jetson Xavier. I am also running ROS in the background as I am using some RostoIsaac and IsaactoRos bridges. All the time, usage of one of the cores reaches 100 to 300 % or more.
When this happens, the Xavier looses internet connection.
Has anybody faced this issue? How do I sort out this issue?

@hemals @Swapnesh @sdesai ^^^ Thanks

Are you running with maxN power mode by any chance? Are all of the cores maxing out or just one? (300% seems like 3 cores are fully engaged). It is possible the process or thread running network traffic is getting starved out, but just guessing. jtop might give you a better idea of what’s going on when this happens.

There are ways to reserve cores or “shield” them from your Isaac SDK app so they remain available to the system. If CPU overload is the cause here, there are options.

@hemals Yes, I’m running Xavier with MAXN power mode. Just one or two cores are maxing out and it is random at times, core 1 at some point of time and core 7 another time and it keeps changing.
The way to reserve the core for network traffic or shield them from Isaac SDK seems very useful for us. Could you please help on how to do that with any reference links or articles?

The network traffic is being maintained by an Isaac SDK component though? If that’s the case, then CPU shielding won’t help since that would reserve cores that Isaac SDK has available to work with. Setting thread priorities should be effective. When system is under load, it should not starve the network connection components.

Isaac SDK does not have a way to set component priorities in the scheduler, unfortunately. You could spawn your own thread from the component that manages the actual comms and set its thread priority higher than any of the threads in the Isaac SDK scheduler pool.

Switching to an RT kernel and adding something like below should resolve the issue.

+         this->op_thread = make_unique<std::thread>(op_main, this, this->running.get_future());
+        if ((get_core_idx() > 0) &&
+             (static_cast<uint>(get_core_idx()) < std::thread::hardware_concurrency())) {
+            cpu_set_t core_set;
+            CPU_ZERO(&core_set);
+            CPU_SET(get_core_idx(), &core_set);
+            LOG_DEBUG("CPU core: %d", get_core_idx());
+            if (pthread_setaffinity_np(this->op_thread->native_handle(), sizeof(cpu_set_t), &core_set) != 0) {
+                LOG_DEBUG("unable to set CPU affinity: %s", std::strerror(errno));
+            };
+            const int thread_priority = sched_get_priority_max(SCHED_FIFO);
+            if (thread_priority == -1) {
+                LOG_DEBUG("unable to get maximum possible thread priority: %s", std::strerror(errno));
+            }
+
+            sched_param thread_param{};
+            thread_param.sched_priority = thread_priority;
+            if (pthread_setschedparam(this->op_thread->native_handle(),
+                                      SCHED_FIFO, &thread_param) != 0) {
+                LOG_DEBUG("unable to set realtime scheduling: %s", std::strerror(errno));
+            }

@hemals Thanks for the above suggestion!
But, since we’re working on Isaac SDK, then we’re not supposed to manually multi-thread our nodes/components as per the Isaac documentation right?
Also, if the internet connectivity is provided by some other process that is installed with the system (something we didn’t write), how are we supposed to modify that?

It is not recommended, but you can multi-thread your nodes with the burden of maintaining that concurrency yourself of course. I had understood that a network client running in an Isaac SDK component was being starved out and thus causing a timeout in the connection. If the internet connectivity is instead being maintained by another process, however, then you could have your Isaac SDK app “play nice” with other processes on the system by setting ulimits when launching your Isaac SDK app. This would ensure that your Isaac SDK app does not starve out other processes on the system.

@hemals Thanks for the quick response for this issue.
ulimits is an interesting command which could help me set or report the resource limit of the user. This could help us not starve internet connectivity. And I’ll try this out immediately today to check if this would work out for us.