DualOTG problems along with DifferentialBaseController

We are trying to migrate to DualOTG-V planning algorithm for better control. We were able to port our existing navigation stack towards DualOTG, however problem arises on DualOTG5 and DifferentialBaseController codelet :

core/math/so2.hpp@57: Given cos/sin vector is not normalized: -nan

Thread 5 (crashed)
0 libc.so.6 + 0x324f8
x0 = 0x0000000000000000 x1 = 0x0000007f73095d78
x2 = 0x0000000000000000 x3 = 0x0000000000000008
x4 = 0x0000000000000000 x5 = 0x0000007f73095d78
x6 = 0xffffffffffffffff x7 = 0xffffffffffffffff
x8 = 0x0000000000000087 x9 = 0xffffffffffffffff
x10 = 0xffffffffffffffff x11 = 0xffffffffffffffff
x12 = 0xffffffffffffffff x13 = 0xffffffffffffffff
x14 = 0x0000000000000008 x15 = 0x0000007f85882e08
x16 = 0x0000007f858b2780 x17 = 0x0000007f8484dfa8
x18 = 0x0000000000000794 x19 = 0x0000007f859d1000
x20 = 0x0000000000000006 x21 = 0x0000007f8484a000
x22 = 0x0000007f730964a0 x23 = 0x0000007f730964c0
x24 = 0x0000007f73096480 x25 = 0x0000007f73096530
x26 = 0x0000007f730961e0 x27 = 0x00000055aada19c0
x28 = 0x0000008381a6c545 fp = 0x0000007f73095d50
lr = 0x0000007f858b1484 sp = 0x0000007f73095d50
pc = 0x0000007f858b14f8
Found by: given as instruction pointer in context
1 libc.so.6 + 0x32480
fp = 0x0000007f73095e80 lr = 0x0000007f858b28d4
sp = 0x0000007f73095d60 pc = 0x0000007f858b1484
Found by: previous frame’s frame pointer
2 libc.so.6 + 0x338d0
fp = 0x0000007f73095fc0 lr = 0x0000007f846749c0
sp = 0x0000007f73095e90 pc = 0x0000007f858b28d4
Found by: previous frame’s frame pointer
3 libcontroller_module.so!isaac::controller::DifferentialBaseControl::tick() + 0x13c4
fp = 0x0000007f73096570 lr = 0x0000007f84948964
sp = 0x0000007f73095fd0 pc = 0x0000007f846749c0
Found by: previous frame’s frame pointer
4 main + 0xc95f4
x19 = 0x0000007f5a746950 x20 = 0x00000055a9e33268
x21 = 0x00000055a9e335e8 x22 = 0x0000007f730966d8
x23 = 0x0000007f85d6b7a8 x24 = 0x00000055a9e33268
x25 = 0x0000007f730965f0 x26 = 0x000000558d285e9c
x27 = 0x00000055a9e6fa50 x28 = 0x00000055a9e33268
fp = 0x0000007f73096600 sp = 0x0000007f73096b20
pc = 0x000000558d2805f8
Found by: call frame info

We analyzed crash dump and it comes from DualOTG/DifferentialBaseController codelets as i’ve mentioned previously, problem is we cannot get into those to catch an exception.

We also disabled pid_controller as it was worse in DualOTG-V case.

Interestingly enough it does not happed in LQR mode.

Would it be possible to patch those things or release an update to catch those errors ?

For context, the error indicates that there was a malformed SO2 (special ortogonal in 2D) which is basically a angle encoded as a cos theta and a sin theta. Both values should be naturally normalized, but the data structure had nan’s or 0s in it causing the issue.

Catching the exception would not necessarily help. It is a fundamental math error problem where some sort of questionable data made its way in somehow. DifferentialBaseControl appears to have been sent a trajectory (State proto) that had an invalid values in it.

Did you notice any other odd values being fed into DualOTG5, maybe an invalid state proto as input perhaps? Is this reproducible for you?

There is a differential to detla plan convertor. I presume it happens somewhere in-between that there is division by zero somewhere which causes nan’s. In our case crashing leads to wild robot behaviour, but catching an exception with waiting for subsequent plan frame or adding infinitesimal values would be beneficial and more robust. Given nature of closed source of pretty much all the modules in ISAAC it becomes a problem, as now we have to intercept plan coming out from DifferentialToDelta plan convertor and analyze it.

After long debugging and reverse engineering, we are still unable to pinpoint what’s causing this. The original thought was crawled in data in the differential plan - we intercept it, inject gaussian noise to make it non 0, basically. Still crashes. After debugging more, we found that there is a function call in tryGetLatest for pose estimation - tried to inject noise there too, robot behaves as drunken man, yet however, bug is still crawling in. The interesting part is that it does not happen in LQR, but i presume that libcontroller_module.so is identical to DualOTG5. My question is, will you be able to release the DifferentialBaseController.cpp code for at least browsing? Or, if not, at least pinpoint some important function calls that are using so2.hpp from engine.