SensorManager: SensorTimeouts on DRIVE AGX with custom IMU-Plugin

Please provide the following info (check/uncheck the boxes after clicking “+ Create Topic”):
Software Version
DRIVE OS Linux 5.2.0
DRIVE OS Linux 5.2.0 and DriveWorks 3.5
NVIDIA DRIVE™ Software 10.0 (Linux)
NVIDIA DRIVE™ Software 9.0 (Linux)
other DRIVE OS version
other

Target Operating System
Linux
QNX
other

Hardware Platform
NVIDIA DRIVE™ AGX Xavier DevKit (E3550)
NVIDIA DRIVE™ AGX Pegasus DevKit (E3550)
other

SDK Manager Version
1.4.1.7402
other

Host Machine Version
native Ubuntu 18.04
other

Hi,

we are having trouble with using SensorManager on Drive AGX Pegasus in combination with a CAN-based IMU-Sensor Plugin. Here is an example of the code:

int main (int argc, char** argv)
{
dwContextHandle_t m_ctx         = DW_NULL_HANDLE;
dwSALHandle_t m_sal             = DW_NULL_HANDLE;
dwSensorManagerHandle_t m_sm    = DW_NULL_HANDLE;

dwContextParameters sdkParams = {};
dwInitialize(&m_ctx, DW_VERSION, &sdkParams);
dwSAL_initialize(&m_sal, m_ctx);
dwSensorManager_initialize(&m_sm, 1000, m_sal);

std::string protocol = "imu.custom";
std::string params = "decoder-path=libimu_plugin.so,can-proto=can.socket,device=can2";
dwSensorParams sensorParams{};
sensorParams.parameters = params.c_str();
sensorParams.protocol   = protocol.c_str();
dwSensorManager_addSensor(sensorParams, 0, m_sm);

dwSensorManager_start(m_sm);

dwStatus ret = DW_SUCCESS;
const dwSensorEvent* event = nullptr;

while (ret == DW_SUCCESS || ret == DW_TIME_OUT)
{
    ret = dwSensorManager_acquireNextEvent(&event, 1000, m_sm);

    if (ret == DW_TIME_OUT)
    {
        std::cout << "timeout" << std::endl;
        continue;
    }

    switch (event->type) {
    case DW_SENSOR_CAMERA: break;
    case DW_SENSOR_CAN: break;
    case DW_SENSOR_RADAR: break;
    case DW_SENSOR_TIME: break;
    case DW_SENSOR_DATA: break;
    case DW_SENSOR_COUNT: break;
    case DW_SENSOR_LIDAR: break;
    case DW_SENSOR_IMU:
    {
        std::cout << "received imu event" << std::endl;
        break;
    }
    case DW_SENSOR_GPS:
    {
        std::cout << "received gps event" << std::endl;
        break;
    }

    }

    ret = dwSensorManager_releaseAcquiredEvent(event, m_sm);
}

dwSensorManager_stop(m_sm);
dwSensorManager_release(m_sm);

dwSAL_release(m_sal);
dwRelease(m_ctx);

return 0; }

On Host-PC this works fine:

christian@NvidiaHost:~$ ./install_isolated/lib/sensor_manager/minimal_example_node
received imu event
received imu event
received imu event
received imu event
received imu event
received imu event
received imu event
received imu event
received imu event
received imu event
received imu event
received imu event

On DRIVE AGX though, we get senor timeouts:

nvidia@tegra-ubuntu:~$ ./install_isolated/lib/sensor_manager/minimal_example_node
timeout
timeout
timeout
timeout
timeout
timeout
timeout
timeout
timeout
timeout
timeout
timeout

We can confirm via candump can2 that correct data is coming on can2-channel, in fact sample_imu_logger works perfectly fine with our IMU-plugin on DRIVE AGX:

nvidia@tegra-ubuntu:~$ /usr/local/driveworks/bin/sample_imu_logger --driver=imu.custom --params=decoder-path=libimu_plugin.so,can-proto=can.socket,device=can2
[21-09-2019 03:11:22] Platform: Detected DDPX - Tegra A
[21-09-2019 03:11:22] TimeSource: monotonic epoch time offset is 1569027512042074
[21-09-2019 03:11:22] PTP Time is available from NVPPS Driver
[21-09-2019 03:11:24] Platform: number of GPU devices detected 2
[21-09-2019 03:11:24] Platform: currently selected GPU device discrete ID 0
[21-09-2019 03:11:24] SDK: Resources mounted from /usr/local/driveworks-2.2/data/
[21-09-2019 03:11:24] SDK: Create NvMediaDevice
[21-09-2019 03:11:24] egl::Display: found 2 EGL devices
[21-09-2019 03:11:24] egl::Display: use drm device: drm-nvdc
[21-09-2019 03:11:24] TimeSource: monotonic epoch time offset is 1569027512042074
[21-09-2019 03:11:24] PTP Time is available from NVPPS Driver
[21-09-2019 03:11:24] Initialize DriveWorks SDK v2.2.3136
[21-09-2019 03:11:24] Release build with GNU 7.3.1 from heads/buildbrain-branch-0-gca7b4b26e65 against Drive PDK v5.1.6.1
[21-09-2019 03:11:24] SensorFactory::createSensor() -> imu.custom, decoder-path=libimu_plugin.so,can-proto=can.socket,device=can2
[21-09-2019 03:11:24] SensorFactory::createSensor() -> can.socket, decoder-path=libimu_plugin.so,can-proto=can.socket,device=can2
[21-09-2019 03:11:24] CANSocket: Cannot get current state of hardware time stamping: ioctl(SIOCGHWTSTAMP, can2) -> Operation not supported
[21-09-2019 03:11:24] CANSocket: software based timestamps will be used for can2
[21-09-2019 03:11:24] CANSocket: use SW based timestamps for can2
[21-09-2019 03:11:24] CANSocket: started can2
[1569028284963339] Orientation(R:-3.30422 P:0.931091 Y:-7.95033 ) OrientationQuaternion(X:-0.0281983 Y:0.0101013 Z:-0.0690613 W:0.997132 )
[1569028284963743] Gyro(X:0.002 Y:-0.002 Z:0.002 )
[1569028284964273] Acceleration(X:-0.117 Y:-0.5226 Z:9.8124 )
[1569028284964351] Magnetometer(X:-0.0527344 Y:0.90918 Z:-0.120117 )
[1569028284963339] Orientation(R:-3.30422 P:0.931091 Y:-7.95033 ) OrientationQuaternion(X:-0.0281983 Y:0.0101013 Z:-0.0690613 W:0.997132 )
[1569028284963743] Gyro(X:0.002 Y:-0.002 Z:0.002 )
[1569028284964273] Acceleration(X:-0.117 Y:-0.5226 Z:9.8124 )
[1569028284964351] Magnetometer(X:-0.0527344 Y:0.90918 Z:-0.120117 )
[1569028284963339] Orientation(R:-3.30422 P:0.931091 Y:-7.95033 ) OrientationQuaternion(X:-0.0281983 Y:0.0101013 Z:-0.0690613 W:0.997132 )
[1569028284963743] Gyro(X:0.002 Y:-0.002 Z:0.002 )
[1569028284964273] Acceleration(X:-0.117 Y:-0.5226 Z:9.8124 )
[1569028284964351] Magnetometer(X:-0.0527344 Y:0.90918 Z:-0.120117 )
[1569028284963339] Orientation(R:-3.30422 P:0.931091 Y:-7.95033 ) OrientationQuaternion(X:-0.0281983 Y:0.0101013 Z:-0.0690613 W:0.997132 )
[1569028284963743] Gyro(X:0.002 Y:-0.002 Z:0.002 )
[1569028284964273] Acceleration(X:-0.117 Y:-0.5226 Z:9.8124 )
[1569028284964351] Magnetometer(X:-0.0527344 Y:0.90918 Z:-0.120117 )
[1569028284963339] Orientation(R:-3.30422 P:0.931091 Y:-7.95033 ) OrientationQuaternion(X:-0.0281983 Y:0.0101013 Z:-0.0690613 W:0.997132 )
[1569028284963743] Gyro(X:0.002 Y:-0.002 Z:0.002 )
[1569028284964273] Acceleration(X:-0.117 Y:-0.5226 Z:9.8124 )
[1569028284964351] Magnetometer(X:-0.0527344 Y:0.90918 Z:-0.120117 )
[1569028284963339] Orientation(R:-3.30422 P:0.931091 Y:-7.95033 ) OrientationQuaternion(X:-0.0281983 Y:0.0101013 Z:-0.0690613 W:0.997132 )
[1569028284963743] Gyro(X:0.002 Y:-0.002 Z:0.002 )
[1569028284964273] Acceleration(X:-0.117 Y:-0.5226 Z:9.8124 )
[1569028284964351] Magnetometer(X:-0.0527344 Y:0.90918 Z:-0.120117 )
[1569028284963339] Orientation(R:-3.30422 P:0.931091 Y:-7.95033 ) OrientationQuaternion(X:-0.0281983 Y:0.0101013 Z:-0.0690613 W:0.997132 )

Strangely enough, a couple of days ago our code with SensorManager randomly started working on DRIVE AGX without changing anything in our program or our vehicle architecture. The next day, it did not work anymore. This is really strange behaviour and to me indicates memory problems somewhere in SensorManager.

In contrast, we have a custom GPS-plugin which is also CAN-based. The functionality of the plugin is basically the same to our IMU-plugin in that it reads data from CAN and translates it into a Sensorframe. For both plugins we used the sample code for GPS and IMU-plugins provided with Driveworks samples. The GPS-plugin works fine, both on Host-PC as well as on DRIVE-AGX.

How can it be that our IMU-plugin works fine with sample_imu_logger but not with SensorManager on DRIVE-AGX? Does SensorManager somehow allocate Sensors differently in Memory? Why does it work on Host-PC but not on DRIVE AGX? Why does it work for GPS-Sensor but not for IMU-Sensor? I triple checked any potential source of error, to me this must be some bug within SensorManager.

Any help would be greatly appreciated.

Thanks,
Christian

Dear @cschro,
We will check internally on the issue you have indicated.
Could you increase the timeout value in dwSensorManager_acquireNextEvent() and confirm? Also, Please confirm if you notice timeout error all the time when using with SensorManager APIs?

Thanks for the quick response.

We have increased the timeout value up to 1000000 (1 sec) with CAN-messages coming in at 10Hz. We still only received sensor timeouts.

For IMU-sensor, we only get timeouts even after running the program for a minute or so. Except for that random event a couple of days ago, where it suddenly started to work. But after that, I only receive timeout error when using SensorManager API, no matter what.

Dear @cschro,
So you noticed one event where it worked. Did you make any changes in code/ setup? Could you please share your connection details?

As mentioned, we did not change anything in our code or our setup, before and after it randomly started working.

We use a PCAN-USB Hub for our can connections, set up as can2. We can confirm the connection works as expected with candump:

nvidia@tegra-ubuntu:~$ candump can2
can2  005   [4]  01 3D 58 BA
can2  006   [2]  08 1E
can2  011   [4]  00 00 00 02
can2  021   [8]  7F BC FC 8C 00 D6 F8 A7
can2  032   [6]  00 01 00 03 00 03
can2  033   [8]  7F FF 00 12 00 20 00 05
can2  034   [6]  FF EB FF 73 09 C7
can2  041   [6]  FF DD 03 A5 FF 82
can2  005   [4]  01 3D 7F CA
can2  006   [2]  08 1F
can2  011   [4]  00 00 00 02
can2  021   [8]  7F BC FC 8C 00 D8 F8 A8
can2  032   [6]  00 01 00 00 00 01
can2  033   [8]  7F FF 00 12 00 21 00 06
can2  034   [6]  FF E4 FF 7B 09 CF
can2  041   [6]  FF DE 03 A5 FF 87

The messages are exactly what we expect with the correct identifiers which are processed in our custom IMU-plugin. In order to exclude the CAN-interface as a source of error, we also tried with the given can0 and can1 interfaces of DRIVE AGX. No difference, SensorManager still does not work.

As I said, with the exact same setup and code, SensorManager works fine on Host-PC. Also sample_imu_logger works fine with our Plugin on DRIVE AGX. It is just SensorManager on DRIVE AGX which produces sensor timeouts.

Dear @cschro ,
Ok. Got it. Could you quickly check return values of dwSensorManager_addSensor() and dwSensorManager_start().

If we output return values we get:

nvidia@tegra-ubuntu:~$ ./install_isolated/lib/sensor_manager/minimal_example_node
return value dwSensorManager_addSensor: 0
return value dwSensorManager_start: 0
timeout
timeout
timeout
timeout

Dear @cschro,
The issue probably be in the plugin code. Could you check instrument the plugin code by check return values for DW API calls in readRawData(), returnRawData(), pushData(), parseDataBuffer() and confirm if the data received correctly.

If it does not help, Could you use the sensor APIs directly as given in https://docs.nvidia.com/drive/driveworks-3.5/imu_usecase1.html and confirm.

So, what is the easiest way to check return values for DW API calls of readRawData(), returnRawData(), pushData(), parseDataBuffer(). I tried to simply std::cout return values. This typically works on Host and when sensor API is used directly but it does not work with SensorManager on Drive AGX. I cannot retrieve any output from IMU-plugin when using SensorManager.

As I mentioned in my first post, sample_imu_logger works perfectly fine with our custom IMU-plugin on DRIVE AGX. So I can confirm that the plugin generally works.

Again: the problem only occurs when using SensorManager on DRIVE AGX. Everything works fine on Host and when using Sensor API directly on DRIVE AGX. There should not be any bug within our plugin. We used Driveworks sample code for custom IMU-plugin that is provided by Nvidia and only implemented a few adjustments.

It just seems that SensorManager is having trouble connecting to CAN-interface, although it does not produce any error. But clearly, no data is arriving from can2-interface in SensorManager. So I am really asking myself, what does SensorManager do differently than the standard Sensor API when initializing sensors?

Hi @cschro ,

Please help to check when seeing this timeout issue, has dwSensorPlugin_readRawData() implemented in your plugin really been called? Thanks.

I have figured out the source of error but I still do not understand why this error occurs, especially only on DRIVE AGX. Let me try to explain:

So as I mentioned, we have two custom plugins, one for IMU-sensor and one for GPS-sensor. The problems only occurred when using custom IMU-sensor on DRIVE AGX.

In order to check if API call readRawData() was being called, I debugged with gdb and set a breakpoint at function readRawData. On Host, I get the expected output:

(gdb) break readRawData
Function "readRawData" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (readRawData) pending.
(gdb) run
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /home/christian/absolut_projects/main_ws/install_isolated/lib/sensor_manager/minimal_example_node
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7fffb76f2700 (LWP 21219)]
[New Thread 0x7fffb6ef1700 (LWP 21220)]
[New Thread 0x7fffb3fff700 (LWP 21221)]

Thread 1 "minimal_example" hit Breakpoint 1, 0x00007fffb60ed920 in dw::plugins::imu::XsensIMUSensor::readRawData(unsigned char const**, unsigned long*, long)@plt ()
   from /home/christian/absolut_projects/main_ws/install_isolated/lib/libxsens_dw_sal_plugin.so
(gdb) c
Continuing.

Thread 1 "minimal_example" hit Breakpoint 1, dw::plugins::imu::XsensIMUSensor::readRawData (this=0x55555e278170, data=0x7fffffffb460, size=0x7fffffffb468, timeout_us=15)
    at /home/christian/absolut_projects/main_ws/src/imu/xsens/src/dw_sal_plugin.cpp:215
215         dwStatus readRawData(const uint8_t** data, size_t* size, dwTime_t timeout_us)
(gdb)

Then I tried on DRIVE AGX and got the following output:

(gdb) break readRawData
Function "readRawData" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (readRawData) pending.
(gdb) run
Starting program: /home/nvidia/install_isolated/lib/sensor_manager/minimal_example_node
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
[New Thread 0x7f781d8570 (LWP 18820)]
[New Thread 0x7f776c4570 (LWP 18823)]
[New Thread 0x7f76ec3570 (LWP 18824)]
[New Thread 0x7f766c2570 (LWP 18825)]
[New Thread 0x7f75ec1570 (LWP 18826)]
[New Thread 0x7f756c0570 (LWP 18827)]
[New Thread 0x7f74ebf570 (LWP 18828)]
[New Thread 0x7f746be570 (LWP 18829)]
[New Thread 0x7f73ebd570 (LWP 18830)]

Thread 1 "minimal_example" hit Breakpoint 1, 0x0000007faa6664a4 in dw::plugins::gps::AnavsSensor::readRawData(unsigned char const**, unsigned long*, long)@plt ()
   from /home/nvidia/install_isolated/lib/libanavs_dw_sal_plugin.so
(gdb) c
Continuing.

Thread 1 "minimal_example" hit Breakpoint 1, dw::plugins::gps::AnavsSensor::readRawData (this=0x11eb7560, data=0x7fffffc5c0, size=0x7fffffc5c8, timeout_us=15)
    at /home/christian/absolut_projects/main_ws/src/gnss/anavs/src/dw_sal_plugin.cpp:164
164     /home/christian/absolut_projects/main_ws/src/gnss/anavs/src/dw_sal_plugin.cpp: No such file or directory.
(gdb) quit

If you check the output when hitting the breakpoint, you see that function readRawData is being called, but as dw::plugins::gps::AnavsSensor::readRawData. So it seems SensorManager uses our GPS-plugin instead of our IMU-plugin. Although when we initialize our IMU-sensor, we specifically hand in parameter decoder-path as the path to our custom IMU-plugin .so file. This seems really strange, especially because this only happens on DRIVE AGX. On Host, the correct plugin is used.

So I checked our compiling options and it seems that we accidentally linked our custom plugins (both IMU and GPS) to the executable. We are using catkin for compiling, when compiling our plugins we set the following in CMakeLists.txt:

 catkin_package(
INCLUDE_DIRS include
LIBRARIES imu_dw_sal_plugin 
CATKIN_DEPENDS roscpp 
)

and when compiling our program for SensorManager, we set:

target_link_libraries(minimal_example_node
${Driveworks_LIBRARIES}
${CUDA_LIBRARIES}
${CUDA_cublas_LIBRARY}
${catkin_LIBRARIES}
)

So I removed ${catkin_LIBRARIES} from target_link_libraries() and tried again. After compiling and testing, now our program started to work as expected on DRIVE AGX.

So the question really is, why did SensorManager choose our GPS-plugin over our IMU-plugin, although we specified the path to our IMU-plugin.so correctly. And why does this only happen on DRIVE AGX and not on Host although compiling options where the same for both systems?

I am no compiling expert so if this is expected behaviour when we compile the plugin-libraries against our target, I would really appreciate if someone could explain this to me. But since this error only happened on DRIVE AGX and not on Host, this seems kind of odd. Especially when parameter decoder-path is handed to SensorManager and it suddenly decides to use a different plugin.so than the one specified.

Thanks,
Christian

Your libimu_plugin.so is loaded by calling dlopen() in our library. I don’t have any idea how removing ${catkin_LIBRARIES} can help on this issue. Could you try with clean build without removing it?

I tried with a clean build, the problem is still there. SensorManager chooses our GPS-plugin over the IMU-plugin, although the parameter decoder-path points to our IMU-plugin.so. So far, only removing ${catkin_LIBRARIES} solves the problem.

Still, this only occurs on DRIVE AGX. On Host, I can compile without removing ${catkin_LIBRARIES} and both SensorManager and our IMU-Plugin work fine.

You may compare the linker commands between with and without ${catkin_LIBRARIES} to get some ideas.