Nvsipl_camera: Intermittent 0 fps / "FusaVI: first slice fence wait failed" after ungraceful reboot on Thor

DRIVE OS Version: 7.0.3

Issue Description:

Platform

  • SoC: NVIDIA DRIVE Thor
  • DriveOS: 7.0.3
  • Cameras: 7x Sony IMX728 (RGGB)
  • Binary: nvsipl_camera sample from DriveOS 7.0.3 SDK, built from source
  • Mask: -m "0x1111 0x0000 0x0000 0x1101"
  • Config: -c IMX728_96724_RGGB_CPHY_x4

Problem

We originally hit this issue in our own application that uses NVSIPL for camera capture. To
isolate the problem, we reproduced it using NVIDIA’s nvsipl_camera sample built from SDK
source - the behavior is identical.

After an ungraceful reboot (via trigger_sys_reboot), nvsipl_camera --showfps intermittently
reports 0 fps on all cameras with FusaVI: first slice fence wait failed errors. The issue is
not deterministic - some reboots work fine, others don’t.

We run the binary as a systemd service that starts automatically after nv_nvsciipc_init.service.

Key observations

  1. Ungraceful reboot → intermittent failure. We trigger reboot via:

    echo 1 | sudo tee /sys/class/tegra_hv_pm_ctl/tegra_hv_pm_ctl/device/trigger_sys_reboot
    

    Out of 6 consecutive ungraceful reboots, ~2 resulted in 0 fps / FusaVI errors.

  2. Adding a 60s delay before starting the binary reduces but does not eliminate the issue.
    With ExecStartPre=/bin/sleep 60 in the systemd unit, we still see occasional failures
    (~1 in 5 reboots).

  3. Running the binary manually (after stopping the service) always works, even after a
    failed service attempt - but only after another reboot. If cameras are already in a failed
    state, restarting the binary does not recover them. We also implemented an autorecovery
    mechanism similar to nvsipl_camera’s built-in recovery in our own app, but it does not
    recover from this state either.

  4. Manual run after ungraceful reboot (with service disabled) always works.
    We disabled the service, triggered an ungraceful reboot, then ran the binary
    manually - all 7 cameras at ~30fps, 0 drops.

Error sequence (failure case)

The service starts, pipeline initializes (sensor detection, NITO files loaded), then:

[TS:86666416176] Module_id 30 Severity 2 : FusaVI: first slice fence wait failed
[TS:86666435416] Module_id 30 Severity 2 : FusaVI: first slice fence wait failed
[TS:86666454490] Module_id 30 Severity 2 : FusaVI: first slice fence wait failed
[TS:86666474833] Module_id 30 Severity 2 : FusaVI: first slice fence wait failed
[TS:86666402490] Module_id 30 Severity 2 : FusaVI: first slice fence wait failed
[TS:86666499314] Module_id 30 Severity 2 : FusaVI: first slice fence wait failed
[TS:86666514629] Module_id 30 Severity 2 : FusaVI: first slice fence wait failed

Followed by timeouts:

[TS:88682373554] Module_id 30 Severity 2 : (fusa) Error: Timeout capture waiting for status timeout in: fusaViChannelLinux.cpp 370
[TS:88682402109] Module_id 30 Severity 2 : (fusa) Error: Timeout  propagating from: fusaViHandler.cpp 907
[TS:88682456433] Module_id 30 Severity 2 : Getting capture status failed!

Kernel dmesg confirms VI-level failures:

tegra194-vi5 8181240000.host1x:vi1@8188c00000: vi capture get status failed
tegra194-vi5 8181240000.host1x:vi0@8188400000: vi capture get status failed

All cameras then report 0 fps:

Sensor0_Out1   Frame rate (fps):   0
Sensor1_Out1   Frame rate (fps):   0
...
Sensor15_Out1  Frame rate (fps):   0

Successful run (same binary, same reboot method)

No FusaVI errors, all cameras streaming:

Sensor0_Out1   Frame rate (fps):   31.4989
Sensor1_Out1   Frame rate (fps):   31.4989
...
Sensor15_Out1  Frame rate (fps):   31.4989

Systemd service unit

[Unit]
Description=nvsipl_camera standalone test
After=nv_nvsciipc_init.service
Requires=nv_nvsciipc_init.service

[Service]
Type=simple
ExecStartPre=/bin/sleep 60
ExecStart=/usr/bin/stdbuf -oL /home/.bin/nvsipl_camera_from_source \
    -c IMX728_96724_RGGB_CPHY_x4 \
    -m "0x1111 0x0000 0x0000 0x1101" \
    --showfps
Restart=always
RestartSec=5

[Install]
WantedBy=multi-user.target

Questions

  1. What is the recommended way to ensure camera/VI hardware is ready after an ungraceful
    reboot before starting nvsipl_camera? Is there a service or device node we should
    wait for beyond nv_nvsciipc_init.service?

  2. Once the cameras enter this failed state (0 fps, FusaVI fence errors), is there any
    recovery mechanism short of a full reboot? We’ve tried stopping and restarting the binary
    but it doesn’t help.

  3. Are there known issues with VI/NVCSI initialization after ungraceful shutdown on Thor
    with DriveOS 7.0.3?

  4. Is there any additional information we can collect (traces, register dumps, etc.) that
    would help debug this issue?

Dear @siarhei.fedartsou ,
May I know why you are doing ungrateful reboot. Could you use common_if_testapp -mcureset . Could you please check SoC to MCU Communication — NVIDIA DriveOS 7.0.3 Linux SDK Developer Guide

Thanks for your response! Yes, in usual circumstances we should do graceful response, but abrupt power off is always possible and ideally our software/hardware must be able to restore to working state even after non graceful shutdown. WDYT?
Thanks for this link, I will read it carefully and return back to this topic.

@SivaRamaKrishnaNV I have some automated scripts which reproduce the issue and I quickly changed our old way of reboot (echo 1 | sudo tee ...trigger_sys_reboot ) to graceful one with common_if_testapp -mcureset, but unfortunately final result is the same. Since we start our systemd service at boot time, I guess the issue can be because of boot order, so that we have race condition with one of other system services? WDYT? Can it be a problem?

Dear @siarhei.fedartsou ,
Could you check --autorecovery flag helps ? Please see Automatic Link Recovery — NVIDIA DriveOS 7.0.3 Linux SDK Developer Guide

Hi @SivaRamaKrishnaNV,

Thanks for the suggestion. We tested --autorecovery - unfortunately it does not recover from this state.

We ran nvsipl_camera built from SDK source with --autorecovery --showfps as a systemd service. After a reboot that triggered the failure, autorecovery attempted DisableLink / EnableLink but EnableLink consistently failed. Each attempt fails with Sensor Authentication failed followed by Reconfigure failed for camera module:

Feb 26 14:09:43 thor-1.vr38.ad-tech.boltint.net nvsipl_camera_test[2046]: DisableLink: 14 successful

Feb 26 14:10:55 thor-1.vr38.ad-tech.boltint.net nvsipl_camera_from_source[2046]: [TS:123414065391] Module_id 30 Severity 2 : CNvMMAX96724CameraModule.cpp 573
Feb 26 14:10:55 thor-1.vr38.ad-tech.boltint.net nvsipl_camera_from_source[2046]: [TS:123414077011] Module_id 30 Severity 2 : Sensor Authentication failed with SIPL error 2
Feb 26 14:10:55 thor-1.vr38.ad-tech.boltint.net nvsipl_camera_from_source[2046]: [TS:123421950678] Module_id 30 Severity 2 : CNvMDeviceBlock.cpp 2114
Feb 26 14:10:55 thor-1.vr38.ad-tech.boltint.net nvsipl_camera_from_source[2046]: [TS:123421956484] Module_id 30 Severity 2 : Reconfigure failed for camera module CNvMDeviceBlock::EnableLink. Link 0
Feb 26 14:10:55 thor-1.vr38.ad-tech.boltint.net nvsipl_camera_from_source[2046]: [TS:123421959669] Module_id 30 Severity 2 : CNvMDeviceBlock.cpp 2207
Feb 26 14:10:55 thor-1.vr38.ad-tech.boltint.net nvsipl_camera_from_source[2046]: [TS:123421962909] Module_id 30 Severity 2 : Link DoReconfigureCameraModule failed with SIPL error 2
Feb 26 14:10:55 thor-1.vr38.ad-tech.boltint.net nvsipl_camera_from_source[2046]: [TS:123421967113] Module_id 30 Severity 2 : CNvMCamera.cpp 4065
Feb 26 14:10:55 thor-1.vr38.ad-tech.boltint.net nvsipl_camera_from_source[2046]: [TS:123421969595] Module_id 30 Severity 2 : EnableLink failed for device block index and link index 0 0

Feb 26 14:12:01 thor-1.vr38.ad-tech.boltint.net nvsipl_camera_from_source[2046]: [TS:189637190655] Module_id 30 Severity 2 : CNvMMAX96724CameraModule.cpp 573
Feb 26 14:12:01 thor-1.vr38.ad-tech.boltint.net nvsipl_camera_from_source[2046]: [TS:189637204099] Module_id 30 Severity 2 : Sensor Authentication failed with SIPL error 2
Feb 26 14:12:01 thor-1.vr38.ad-tech.boltint.net nvsipl_camera_from_source[2046]: [TS:189645065914] Module_id 30 Severity 2 : CNvMDeviceBlock.cpp 2114
Feb 26 14:12:01 thor-1.vr38.ad-tech.boltint.net nvsipl_camera_from_source[2046]: [TS:189645070071] Module_id 30 Severity 2 : Reconfigure failed for camera module CNvMDeviceBlock::EnableLink. Link 1
Feb 26 14:12:01 thor-1.vr38.ad-tech.boltint.net nvsipl_camera_from_source[2046]: [TS:189645073581] Module_id 30 Severity 2 : CNvMDeviceBlock.cpp 2207
Feb 26 14:12:01 thor-1.vr38.ad-tech.boltint.net nvsipl_camera_from_source[2046]: [TS:189645076433] Module_id 30 Severity 2 : Link DoReconfigureCameraModule failed with SIPL error 2
Feb 26 14:12:01 thor-1.vr38.ad-tech.boltint.net nvsipl_camera_from_source[2046]: [TS:189645080358] Module_id 30 Severity 2 : CNvMCamera.cpp 4065
Feb 26 14:12:01 thor-1.vr38.ad-tech.boltint.net nvsipl_camera_from_source[2046]: [TS:189645083146] Module_id 30 Severity 2 : EnableLink failed for device block index and link index 0 1

Feb 26 14:13:08 thor-1.vr38.ad-tech.boltint.net nvsipl_camera_from_source[2046]: [TS:255898958016] Module_id 30 Severity 2 : CNvMMAX96724CameraModule.cpp 573
Feb 26 14:13:08 thor-1.vr38.ad-tech.boltint.net nvsipl_camera_from_source[2046]: [TS:255899000683] Module_id 30 Severity 2 : Sensor Authentication failed with SIPL error 2
Feb 26 14:13:08 thor-1.vr38.ad-tech.boltint.net nvsipl_camera_from_source[2046]: [TS:255906774034] Module_id 30 Severity 2 : CNvMDeviceBlock.cpp 2114
Feb 26 14:13:08 thor-1.vr38.ad-tech.boltint.net nvsipl_camera_from_source[2046]: [TS:255906780682] Module_id 30 Severity 2 : Reconfigure failed for camera module CNvMDeviceBlock::EnableLink. Link 2
Feb 26 14:13:08 thor-1.vr38.ad-tech.boltint.net nvsipl_camera_from_source[2046]: [TS:255906784275] Module_id 30 Severity 2 : CNvMDeviceBlock.cpp 2207
Feb 26 14:13:08 thor-1.vr38.ad-tech.boltint.net nvsipl_camera_from_source[2046]: [TS:255906787025] Module_id 30 Severity 2 : Link DoReconfigureCameraModule failed with SIPL error 2
Feb 26 14:13:08 thor-1.vr38.ad-tech.boltint.net nvsipl_camera_from_source[2046]: [TS:255906791599] Module_id 30 Severity 2 : CNvMCamera.cpp 4065
Feb 26 14:13:08 thor-1.vr38.ad-tech.boltint.net nvsipl_camera_from_source[2046]: [TS:255906794312] Module_id 30 Severity 2 : EnableLink failed for device block index and link index 0 2

All sensors remain at 0 fps throughout:

Feb 26 14:11:13 thor-1.vr38.ad-tech.boltint.net nvsipl_camera_test[2046]: Sensor0_Out1        Frame rate (fps):                0
Feb 26 14:11:13 thor-1.vr38.ad-tech.boltint.net nvsipl_camera_test[2046]: Sensor1_Out1        Frame rate (fps):                0
Feb 26 14:11:13 thor-1.vr38.ad-tech.boltint.net nvsipl_camera_test[2046]: Sensor2_Out1        Frame rate (fps):                0
Feb 26 14:11:13 thor-1.vr38.ad-tech.boltint.net nvsipl_camera_test[2046]: Sensor3_Out1        Frame rate (fps):                0
Feb 26 14:11:13 thor-1.vr38.ad-tech.boltint.net nvsipl_camera_test[2046]: Sensor12_Out1        Frame rate (fps):                0
Feb 26 14:11:13 thor-1.vr38.ad-tech.boltint.net nvsipl_camera_test[2046]: Sensor14_Out1        Frame rate (fps):                0
Feb 26 14:11:13 thor-1.vr38.ad-tech.boltint.net nvsipl_camera_test[2046]: Sensor15_Out1        Frame rate (fps):                0

Besides that we were testing EnableLink/DisableLink with our own code before and it didn’t help.

One correction to our original post: after further testing, we found that the issue is not limited to ungraceful reboots. We can also reproduce it after a graceful reboot, and even after simply restarting the service via sudo systemctl restart (where SIPL shuts down gracefully via Stop/Deinit before the process exits).

Updated systemd unit used for this test:

[Unit]
Description=nvsipl_camera standalone test
After=nv_nvsciipc_init.service
Requires=nv_nvsciipc_init.service

[Service]
Type=simple
ExecStart=/usr/bin/stdbuf -oL /home/.bin/nvsipl_camera_from_source \
    -c IMX728_96724_RGGB_CPHY_x4 \
    -m "0x1111 0x0000 0x0000 0x1101" \
    --showfps --autorecovery
Restart=always
RestartSec=5

[Install]
WantedBy=multi-user.target

Do you may be have any other recommendations?

One more update. I built minimal sample demonstrating the issue. See it in the end of the post. It depends only on Nvidia libraries and STL. And also I noticed that in order to reproduce issue we don’t even need any reboot or systemd service, it is enough to build it and re-run several times within the same boot. After 2-3 attempts usually it fails because cannot obtain any frames after successful init of SIPL.

What can be a root cause of the problem?

// Minimal raw SIPL camera init test - zero dependencies.
// Tests whether NvSIPL camera capture works at the current optimization level.
// Exit code: 0 = frames received, 1 = no frames (init or capture failure).

#include <chrono>
#include <cstdarg>
#include <cstdint>
#include <cstdio>
#include <cstring>
#include <ctime>
#include <fstream>
#include <map>
#include <optional>
#include <string>
#include <thread>
#include <vector>

#include "NvSIPLCamera.hpp"
#include "NvSIPLClient.hpp"
#include "NvSIPLPipelineMgr.hpp"
#include "NvSIPLQuery.hpp"

#include "nvscibuf.h"
#include "nvscisync.h"

// --- Simple logging (no external deps) ---

static FILE* g_log_file = nullptr;

__attribute__((format(printf, 2, 3))) static void log_msg(const char* level, const char* fmt, ...)
{
  auto now = std::chrono::system_clock::now();
  auto t = std::chrono::system_clock::to_time_t(now);
  auto ms = std::chrono::duration_cast<std::chrono::milliseconds>(now.time_since_epoch()) % 1000;
  struct tm tm_buf{};
  localtime_r(&t, &tm_buf);
  char ts[32];
  std::snprintf(ts, sizeof(ts), "%04d-%02d-%02d %02d:%02d:%02d.%03d", tm_buf.tm_year + 1900, tm_buf.tm_mon + 1,
                tm_buf.tm_mday, tm_buf.tm_hour, tm_buf.tm_min, tm_buf.tm_sec, static_cast<int>(ms.count()));

  va_list args;
  va_start(args, fmt);
  std::fprintf(stderr, "[%s] [%s] ", ts, level);
  std::vfprintf(stderr, fmt, args);
  std::fprintf(stderr, "\n");
  std::fflush(stderr);
  va_end(args);

  if (g_log_file != nullptr)
  {
    va_start(args, fmt);
    std::fprintf(g_log_file, "[%s] [%s] ", ts, level);
    std::vfprintf(g_log_file, fmt, args);
    std::fprintf(g_log_file, "\n");
    std::fflush(g_log_file);
    va_end(args);
  }
}

#define LOG_INFO(...) log_msg("info", __VA_ARGS__)
#define LOG_ERROR(...) log_msg("error", __VA_ARGS__)
#define LOG_WARN(...) log_msg("warn", __VA_ARGS__)

// --- Configuration ---
static constexpr const char* PLATFORM_CFG_NAME = "IMX728_96724_RGGB_CPHY_x4";
static const std::vector<std::uint32_t> LINK_MASKS = {0x1111, 0x0000, 0x0000, 0x1101};
static constexpr std::uint32_t NUM_ICP_BUFFERS = 6;
static constexpr std::uint32_t NUM_ISP0_BUFFERS = 4;
static constexpr const char* NITO_FOLDER = "/usr/share/camera/";
static constexpr int INIT_POLL_ITERATIONS = 50;
static constexpr int INIT_POLL_INTERVAL_MS = 100;
static constexpr std::uint32_t FRAME_TIMEOUT_US = 1'000'000; // 1s blocking timeout per sensor
static constexpr int FPS_LOG_INTERVAL_S = 5;

// --- Helpers ---

static std::optional<std::vector<std::uint8_t>> read_file(const std::string& path)
{
  std::ifstream file(path, std::ios::binary | std::ios::ate);
  if (!file.is_open())
  {
    return std::nullopt;
  }
  const auto size = file.tellg();
  file.seekg(0, std::ios::beg);
  std::vector<std::uint8_t> data(static_cast<std::size_t>(size));
  if (!file.read(reinterpret_cast<char*>(data.data()), size))
  {
    return std::nullopt;
  }
  return data;
}

template <typename T>
static void set_buf_attr(NvSciBufAttrList list, NvSciBufAttrKey key, T value)
{
  // NOLINTNEXTLINE(cppcoreguidelines-pro-type-const-cast)
  NvSciBufAttrKeyValuePair kvp = {key, const_cast<void*>(static_cast<const void*>(&value)), sizeof(T)};
  auto err = NvSciBufAttrListSetAttrs(list, &kvp, 1);
  if (err != NvSciError_Success)
  {
    LOG_ERROR("NvSciBufAttrListSetAttrs failed: key=%d, err=%d", static_cast<int>(key), static_cast<int>(err));
  }
}

template <typename T>
static void set_sync_attr(NvSciSyncAttrList list, NvSciSyncAttrKey key, T value)
{
  // NOLINTNEXTLINE(cppcoreguidelines-pro-type-const-cast)
  NvSciSyncAttrKeyValuePair kvp = {key, const_cast<void*>(static_cast<const void*>(&value)), sizeof(T)};
  auto err = NvSciSyncAttrListSetAttrs(list, &kvp, 1);
  if (err != NvSciError_Success)
  {
    LOG_ERROR("NvSciSyncAttrListSetAttrs failed: key=%d, err=%d", static_cast<int>(key), static_cast<int>(err));
  }
}

// --- Per-sensor state for cleanup ---
struct SensorState
{
  std::string module_name;
  nvsipl::INvSIPLFrameCompletionQueue* isp0_queue = nullptr;
  nvsipl::INvSIPLFrameCompletionQueue* icp_queue = nullptr;

  // ICP buffers
  NvSciBufAttrList icp_attr_list = nullptr;
  NvSciBufAttrList icp_reconciled = nullptr;
  NvSciBufAttrList icp_conflict = nullptr;
  std::vector<NvSciBufObj> icp_bufs;

  // ISP0 buffers
  NvSciBufAttrList isp0_attr_list = nullptr;
  NvSciBufAttrList isp0_reconciled = nullptr;
  NvSciBufAttrList isp0_conflict = nullptr;
  std::vector<NvSciBufObj> isp0_bufs;

  // ISP0 EOF sync
  NvSciSyncAttrList sync_signaler = nullptr;
  NvSciSyncAttrList sync_waiter = nullptr;
  NvSciSyncAttrList sync_reconciled = nullptr;
  NvSciSyncAttrList sync_conflict = nullptr;
  NvSciSyncObj sync_obj = nullptr;
};

static void cleanup_sensor(SensorState& s)
{
  for (auto& buf : s.icp_bufs)
  {
    if (buf != nullptr)
    {
      NvSciBufObjFree(buf);
    }
  }
  for (auto& buf : s.isp0_bufs)
  {
    if (buf != nullptr)
    {
      NvSciBufObjFree(buf);
    }
  }
  if (s.icp_attr_list != nullptr)
  {
    NvSciBufAttrListFree(s.icp_attr_list);
  }
  if (s.icp_reconciled != nullptr)
  {
    NvSciBufAttrListFree(s.icp_reconciled);
  }
  if (s.icp_conflict != nullptr)
  {
    NvSciBufAttrListFree(s.icp_conflict);
  }
  if (s.isp0_attr_list != nullptr)
  {
    NvSciBufAttrListFree(s.isp0_attr_list);
  }
  if (s.isp0_reconciled != nullptr)
  {
    NvSciBufAttrListFree(s.isp0_reconciled);
  }
  if (s.isp0_conflict != nullptr)
  {
    NvSciBufAttrListFree(s.isp0_conflict);
  }
  if (s.sync_signaler != nullptr)
  {
    NvSciSyncAttrListFree(s.sync_signaler);
  }
  if (s.sync_waiter != nullptr)
  {
    NvSciSyncAttrListFree(s.sync_waiter);
  }
  if (s.sync_reconciled != nullptr)
  {
    NvSciSyncAttrListFree(s.sync_reconciled);
  }
  if (s.sync_conflict != nullptr)
  {
    NvSciSyncAttrListFree(s.sync_conflict);
  }
  if (s.sync_obj != nullptr)
  {
    NvSciSyncObjFree(s.sync_obj);
  }
}

static std::string parse_log_file(int argc, char** argv)
{
  for (int i = 1; i < argc - 1; i++)
  {
    if (std::string(argv[i]) == "--log-file")
    {
      return argv[i + 1]; // NOLINT(cppcoreguidelines-pro-bounds-pointer-arithmetic)
    }
  }
  return "";
}

int main(int argc, char** argv)
{
  std::string log_file = parse_log_file(argc, argv);
  if (!log_file.empty())
  {
    g_log_file = std::fopen(log_file.c_str(), "w");
    if (g_log_file != nullptr)
    {
      LOG_INFO("Logging to file: %s", log_file.c_str());
    }
  }

  LOG_INFO("=== SIPL Init Test starting ===");

  NvSciBufModule buf_module = nullptr;
  NvSciSyncModule sync_module = nullptr;
  std::unique_ptr<nvsipl::INvSIPLCamera> sipl_camera;
  std::map<std::uint32_t, SensorState> sensors;
  bool success = false;
  bool started = false;

  auto cleanup = [&]()
  {
    if (sipl_camera)
    {
      if (started)
      {
        LOG_INFO("Stopping...");
        sipl_camera->Stop();
      }
      LOG_INFO("Deinitializing...");
      sipl_camera->Deinit();
      sipl_camera.reset();
    }
    for (auto& [id, s] : sensors)
    {
      cleanup_sensor(s);
    }
    if (sync_module != nullptr)
    {
      NvSciSyncModuleClose(sync_module);
    }
    if (buf_module != nullptr)
    {
      NvSciBufModuleClose(buf_module);
    }
    if (g_log_file != nullptr)
    {
      std::fclose(g_log_file);
      g_log_file = nullptr;
    }
  };

  try
  {
    // 1. Open NvSci modules
    LOG_INFO("Opening NvSciBuf module...");
    auto err = NvSciBufModuleOpen(&buf_module);
    if (err != NvSciError_Success)
    {
      LOG_ERROR("NvSciBufModuleOpen failed: %d", static_cast<int>(err));
      cleanup();
      return 1;
    }

    LOG_INFO("Opening NvSciSync module...");
    err = NvSciSyncModuleOpen(&sync_module);
    if (err != NvSciError_Success)
    {
      LOG_ERROR("NvSciSyncModuleOpen failed: %d", static_cast<int>(err));
      cleanup();
      return 1;
    }

    // 2. Load platform config
    LOG_INFO("Loading platform config: %s", PLATFORM_CFG_NAME);
    auto p_query = nvsipl::INvSIPLQuery::GetInstance();
    if (p_query == nullptr)
    {
      LOG_ERROR("Failed to get INvSIPLQuery instance");
      cleanup();
      return 1;
    }

    auto status = p_query->ParseDatabase();
    if (status != nvsipl::NVSIPL_STATUS_OK)
    {
      LOG_ERROR("ParseDatabase failed: %d", static_cast<int>(status));
      cleanup();
      return 1;
    }

    nvsipl::PlatformCfg platform_cfg{};
    status = p_query->GetPlatformCfg(PLATFORM_CFG_NAME, platform_cfg);
    if (status != nvsipl::NVSIPL_STATUS_OK)
    {
      LOG_ERROR("GetPlatformCfg failed: %d", static_cast<int>(status));
      cleanup();
      return 1;
    }

    status = p_query->ApplyMask(platform_cfg, LINK_MASKS);
    if (status != nvsipl::NVSIPL_STATUS_OK)
    {
      LOG_ERROR("ApplyMask failed: %d", static_cast<int>(status));
      cleanup();
      return 1;
    }

    // Set resetAll on device blocks and deserializers
    for (std::uint32_t d = 0; d < platform_cfg.numDeviceBlocks; d++)
    {
      platform_cfg.deviceBlockList[d].resetAll = true;
      platform_cfg.deviceBlockList[d].deserInfo.resetAll = true;
    }

    LOG_INFO("Platform: %s, numDeviceBlocks: %u", platform_cfg.platform.c_str(), platform_cfg.numDeviceBlocks);

    // 3. Create SIPL camera
    LOG_INFO("Creating INvSIPLCamera...");
    sipl_camera = nvsipl::INvSIPLCamera::GetInstance();
    if (sipl_camera == nullptr)
    {
      LOG_ERROR("Failed to get INvSIPLCamera instance");
      cleanup();
      return 1;
    }

    nvsipl::NvSIPLDeviceBlockQueues device_block_queues{};
    status = sipl_camera->SetPlatformCfg(&platform_cfg, device_block_queues);
    if (status != nvsipl::NVSIPL_STATUS_OK)
    {
      LOG_ERROR("SetPlatformCfg failed: %d", static_cast<int>(status));
      cleanup();
      return 1;
    }
    LOG_INFO("SetPlatformCfg OK");

    // 4. Configure pipelines (ISP0 + ICP only)
    for (std::uint32_t d = 0; d < platform_cfg.numDeviceBlocks; d++)
    {
      const auto& db = platform_cfg.deviceBlockList[d];
      for (std::uint32_t m = 0; m < db.numCameraModules; m++)
      {
        const auto& module = db.cameraModuleInfoList[m];
        const std::uint32_t sensor_id = module.sensorInfo.id;

        nvsipl::NvSIPLPipelineConfiguration pipeline_cfg{};
        pipeline_cfg.captureOutputRequested = true;
        pipeline_cfg.isp0OutputRequested = true;
        pipeline_cfg.isp1OutputRequested = false;
        pipeline_cfg.isp2OutputRequested = false;
        pipeline_cfg.disableSubframe = false;

        nvsipl::NvSIPLPipelineQueues pipeline_queues{};
        status = sipl_camera->SetPipelineCfg(sensor_id, pipeline_cfg, pipeline_queues);
        if (status != nvsipl::NVSIPL_STATUS_OK)
        {
          LOG_ERROR("SetPipelineCfg failed for sensor %u: %d", sensor_id, static_cast<int>(status));
          cleanup();
          return 1;
        }

        SensorState ss;
        ss.module_name = module.name;
        ss.isp0_queue = pipeline_queues.isp0CompletionQueue;
        ss.icp_queue = pipeline_queues.captureCompletionQueue;
        sensors[sensor_id] = std::move(ss);

        LOG_INFO("Sensor %u: pipeline configured, module='%s'", sensor_id, module.name.c_str());
      }
    }

    // 5. Allocate buffers (ICP + ISP0 for each sensor)
    for (auto& [sensor_id, ss] : sensors)
    {
      // --- ICP buffers ---
      err = NvSciBufAttrListCreate(buf_module, &ss.icp_attr_list);
      if (err != NvSciError_Success)
      {
        LOG_ERROR("NvSciBufAttrListCreate(ICP) failed for sensor %u: %d", sensor_id, static_cast<int>(err));
        cleanup();
        return 1;
      }
      set_buf_attr(ss.icp_attr_list, NvSciBufGeneralAttrKey_Types, NvSciBufType_Image);
      set_buf_attr(ss.icp_attr_list, NvSciBufGeneralAttrKey_RequiredPerm, NvSciBufAccessPerm_ReadWrite);

      status = sipl_camera->GetImageAttributes(sensor_id, nvsipl::INvSIPLClient::ConsumerDesc::OutputType::ICP,
                                                ss.icp_attr_list);
      if (status != nvsipl::NVSIPL_STATUS_OK)
      {
        LOG_ERROR("GetImageAttributes(ICP) failed for sensor %u: %d", sensor_id, static_cast<int>(status));
        cleanup();
        return 1;
      }

      NvSciBufAttrList icp_input[] = {ss.icp_attr_list};
      err = NvSciBufAttrListReconcile(icp_input, 1, &ss.icp_reconciled, &ss.icp_conflict);
      if (err != NvSciError_Success)
      {
        LOG_ERROR("NvSciBufAttrListReconcile(ICP) failed for sensor %u: %d", sensor_id, static_cast<int>(err));
        cleanup();
        return 1;
      }

      ss.icp_bufs.resize(NUM_ICP_BUFFERS, nullptr);
      for (std::uint32_t i = 0; i < NUM_ICP_BUFFERS; i++)
      {
        err = NvSciBufObjAlloc(ss.icp_reconciled, &ss.icp_bufs[i]);
        if (err != NvSciError_Success)
        {
          LOG_ERROR("NvSciBufObjAlloc(ICP) failed for sensor %u buf %u: %d", sensor_id, i, static_cast<int>(err));
          cleanup();
          return 1;
        }
      }
      LOG_INFO("Sensor %u: allocated %u ICP buffers", sensor_id, NUM_ICP_BUFFERS);

      // --- ISP0 buffers ---
      err = NvSciBufAttrListCreate(buf_module, &ss.isp0_attr_list);
      if (err != NvSciError_Success)
      {
        LOG_ERROR("NvSciBufAttrListCreate(ISP0) failed for sensor %u: %d", sensor_id, static_cast<int>(err));
        cleanup();
        return 1;
      }
      set_buf_attr(ss.isp0_attr_list, NvSciBufGeneralAttrKey_Types, NvSciBufType_Image);
      set_buf_attr(ss.isp0_attr_list, NvSciBufGeneralAttrKey_RequiredPerm, NvSciBufAccessPerm_ReadWrite);

      status = sipl_camera->GetImageAttributes(sensor_id, nvsipl::INvSIPLClient::ConsumerDesc::OutputType::ISP0,
                                                ss.isp0_attr_list);
      if (status != nvsipl::NVSIPL_STATUS_OK)
      {
        LOG_ERROR("GetImageAttributes(ISP0) failed for sensor %u: %d", sensor_id, static_cast<int>(status));
        cleanup();
        return 1;
      }

      NvSciBufAttrList isp0_input[] = {ss.isp0_attr_list};
      err = NvSciBufAttrListReconcile(isp0_input, 1, &ss.isp0_reconciled, &ss.isp0_conflict);
      if (err != NvSciError_Success)
      {
        LOG_ERROR("NvSciBufAttrListReconcile(ISP0) failed for sensor %u: %d", sensor_id, static_cast<int>(err));
        cleanup();
        return 1;
      }

      ss.isp0_bufs.resize(NUM_ISP0_BUFFERS, nullptr);
      for (std::uint32_t i = 0; i < NUM_ISP0_BUFFERS; i++)
      {
        err = NvSciBufObjAlloc(ss.isp0_reconciled, &ss.isp0_bufs[i]);
        if (err != NvSciError_Success)
        {
          LOG_ERROR("NvSciBufObjAlloc(ISP0) failed for sensor %u buf %u: %d", sensor_id, i, static_cast<int>(err));
          cleanup();
          return 1;
        }
      }
      LOG_INFO("Sensor %u: allocated %u ISP0 buffers", sensor_id, NUM_ISP0_BUFFERS);
    }

    // 5b. Allocate EOF sync objects (ISP0 only)
    for (auto& [sensor_id, ss] : sensors)
    {
      // Signaler attrs (SIPL fills these)
      err = NvSciSyncAttrListCreate(sync_module, &ss.sync_signaler);
      if (err != NvSciError_Success)
      {
        LOG_ERROR("NvSciSyncAttrListCreate(signaler) failed for sensor %u: %d", sensor_id, static_cast<int>(err));
        cleanup();
        return 1;
      }

      status = sipl_camera->FillNvSciSyncAttrList(
          sensor_id, nvsipl::INvSIPLClient::ConsumerDesc::OutputType::ISP0, ss.sync_signaler, nvsipl::SIPL_SIGNALER);
      if (status != nvsipl::NVSIPL_STATUS_OK)
      {
        LOG_ERROR("FillNvSciSyncAttrList(signaler) failed for sensor %u: %d", sensor_id, static_cast<int>(status));
        cleanup();
        return 1;
      }

      // Waiter attrs (our requirements)
      err = NvSciSyncAttrListCreate(sync_module, &ss.sync_waiter);
      if (err != NvSciError_Success)
      {
        LOG_ERROR("NvSciSyncAttrListCreate(waiter) failed for sensor %u: %d", sensor_id, static_cast<int>(err));
        cleanup();
        return 1;
      }
      set_sync_attr(ss.sync_waiter, NvSciSyncAttrKey_NeedCpuAccess, false);
      set_sync_attr(ss.sync_waiter, NvSciSyncAttrKey_RequiredPerm, NvSciSyncAccessPerm_WaitOnly);

      // Reconcile signaler + waiter
      NvSciSyncAttrList sync_input[] = {ss.sync_signaler, ss.sync_waiter};
      err = NvSciSyncAttrListReconcile(sync_input, 2, &ss.sync_reconciled, &ss.sync_conflict);
      if (err != NvSciError_Success)
      {
        LOG_ERROR("NvSciSyncAttrListReconcile failed for sensor %u: %d", sensor_id, static_cast<int>(err));
        cleanup();
        return 1;
      }

      err = NvSciSyncObjAlloc(ss.sync_reconciled, &ss.sync_obj);
      if (err != NvSciError_Success)
      {
        LOG_ERROR("NvSciSyncObjAlloc failed for sensor %u: %d", sensor_id, static_cast<int>(err));
        cleanup();
        return 1;
      }
      LOG_INFO("Sensor %u: allocated EOF sync object", sensor_id);
    }

    // 6. Init
    LOG_INFO("Calling Init()...");
    status = sipl_camera->Init();
    if (status != nvsipl::NVSIPL_STATUS_OK)
    {
      LOG_ERROR("Init() failed: %d", static_cast<int>(status));
      cleanup();
      return 1;
    }
    LOG_INFO("Init() OK");

    // 7. Register buffers
    for (auto& [sensor_id, ss] : sensors)
    {
      // Register ICP
      status = sipl_camera->RegisterImages(sensor_id, nvsipl::INvSIPLClient::ConsumerDesc::OutputType::ICP,
                                            ss.icp_bufs);
      if (status != nvsipl::NVSIPL_STATUS_OK)
      {
        LOG_ERROR("RegisterImages(ICP) failed for sensor %u: %d", sensor_id, static_cast<int>(status));
        cleanup();
        return 1;
      }

      // Register ISP0
      status = sipl_camera->RegisterImages(sensor_id, nvsipl::INvSIPLClient::ConsumerDesc::OutputType::ISP0,
                                            ss.isp0_bufs);
      if (status != nvsipl::NVSIPL_STATUS_OK)
      {
        LOG_ERROR("RegisterImages(ISP0) failed for sensor %u: %d", sensor_id, static_cast<int>(status));
        cleanup();
        return 1;
      }

      LOG_INFO("Sensor %u: buffers registered", sensor_id);
    }

    // 8. Register EOF sync objects
    for (auto& [sensor_id, ss] : sensors)
    {
      status = sipl_camera->RegisterNvSciSyncObj(sensor_id, nvsipl::INvSIPLClient::ConsumerDesc::OutputType::ISP0,
                                                  nvsipl::NVSIPL_EOFSYNCOBJ, ss.sync_obj);
      if (status != nvsipl::NVSIPL_STATUS_OK)
      {
        LOG_ERROR("RegisterNvSciSyncObj failed for sensor %u: %d", sensor_id, static_cast<int>(status));
        cleanup();
        return 1;
      }
      LOG_INFO("Sensor %u: EOF sync registered", sensor_id);
    }

    // 9. Register auto control plugins (NITO files)
    for (auto& [sensor_id, ss] : sensors)
    {
      std::string nito_path = std::string(NITO_FOLDER) + ss.module_name + ".nito";
      auto nito_blob = read_file(nito_path);
      if (!nito_blob)
      {
        LOG_ERROR("Failed to read NITO file: %s", nito_path.c_str());
        cleanup();
        return 1;
      }
      LOG_INFO("Sensor %u: loaded NITO %s (%zu bytes)", sensor_id, nito_path.c_str(), nito_blob->size());

      status = sipl_camera->RegisterAutoControlPlugin(sensor_id, nvsipl::NV_PLUGIN, nullptr, *nito_blob);
      if (status != nvsipl::NVSIPL_STATUS_OK)
      {
        LOG_ERROR("RegisterAutoControlPlugin failed for sensor %u: %d", sensor_id, static_cast<int>(status));
        cleanup();
        return 1;
      }
    }

    // 10. Start
    LOG_INFO("Calling Start()...");
    status = sipl_camera->Start();
    if (status != nvsipl::NVSIPL_STATUS_OK)
    {
      LOG_ERROR("Start() failed: %d", static_cast<int>(status));
      cleanup();
      return 1;
    }
    started = true;
    LOG_INFO("Start() OK - waiting for first frame...");

    // 11. Initial check: poll for first frame (5s timeout, 100ms intervals)
    for (int i = 0; i < INIT_POLL_ITERATIONS && !success; i++)
    {
      for (auto& [sensor_id, ss] : sensors)
      {
        // Drain ICP queue to prevent buffer starvation
        if (ss.icp_queue != nullptr)
        {
          nvsipl::INvSIPLClient::INvSIPLBuffer* icp_buf = nullptr;
          while (ss.icp_queue->Get(icp_buf, 0U) == nvsipl::NVSIPL_STATUS_OK && icp_buf != nullptr)
          {
            icp_buf->Release();
          }
        }

        if (ss.isp0_queue == nullptr)
        {
          continue;
        }

        nvsipl::INvSIPLClient::INvSIPLBuffer* p_buffer = nullptr;
        auto q_status = ss.isp0_queue->Get(p_buffer, 0U); // non-blocking
        if (q_status == nvsipl::NVSIPL_STATUS_OK && p_buffer != nullptr)
        {
          LOG_INFO("SUCCESS: first frame from sensor %u on iteration %d", sensor_id, i);
          p_buffer->Release();
          success = true;
          break;
        }
      }
      if (!success)
      {
        std::this_thread::sleep_for(std::chrono::milliseconds(INIT_POLL_INTERVAL_MS));
      }
    }

    if (!success)
    {
      LOG_ERROR("FAILED: no frames received after %dms init period", INIT_POLL_ITERATIONS * INIT_POLL_INTERVAL_MS);
      cleanup();
      return 1;
    }

    // 12. Infinite round-robin frame drain loop
    LOG_INFO("Draining frames (round-robin)...");

    std::vector<std::uint32_t> active_sensors;
    for (const auto& [sensor_id, ss] : sensors)
    {
      if (ss.isp0_queue != nullptr)
      {
        active_sensors.push_back(sensor_id);
      }
    }

    std::size_t next_idx = 0;
    std::uint64_t total_frames = 0;
    auto last_log_time = std::chrono::steady_clock::now();

    while (true)
    {
      const std::uint32_t sid = active_sensors[next_idx];
      auto& ss = sensors[sid];

      // Drain ICP queue to prevent buffer starvation
      if (ss.icp_queue != nullptr)
      {
        nvsipl::INvSIPLClient::INvSIPLBuffer* icp_buf = nullptr;
        while (ss.icp_queue->Get(icp_buf, 0U) == nvsipl::NVSIPL_STATUS_OK && icp_buf != nullptr)
        {
          icp_buf->Release();
        }
      }

      nvsipl::INvSIPLClient::INvSIPLBuffer* p_buffer = nullptr;
      auto q_status = ss.isp0_queue->Get(p_buffer, FRAME_TIMEOUT_US);

      if (q_status == nvsipl::NVSIPL_STATUS_OK && p_buffer != nullptr)
      {
        total_frames++;
        p_buffer->Release();
      }
      else if (q_status == nvsipl::NVSIPL_STATUS_EOF)
      {
        LOG_WARN("Sensor %u: got EOF, stopping", sid);
        break;
      }

      next_idx = (next_idx + 1) % active_sensors.size();

      // Periodic FPS logging
      auto now = std::chrono::steady_clock::now();
      auto elapsed = std::chrono::duration_cast<std::chrono::seconds>(now - last_log_time).count();
      if (elapsed >= FPS_LOG_INTERVAL_S)
      {
        double fps = static_cast<double>(total_frames) / static_cast<double>(elapsed);
        LOG_INFO("Frames: %llu total, %.1f fps (over %llds)", static_cast<unsigned long long>(total_frames), fps,
                 static_cast<long long>(elapsed));
        total_frames = 0;
        last_log_time = now;
      }
    }
  }
  catch (const std::exception& e)
  {
    LOG_ERROR("Exception: %s", e.what());
  }

  // 12. Cleanup
  cleanup();

  LOG_INFO("=== SIPL Init Test %s ===", success ? "PASSED" : "FAILED");
  return success ? 0 : 1;
}

Similar behavior noticed with nvsipl_camera as well after successful reboot?

Let me repro the issue on my end. May I know used command and camera module?

Camera module is IMX728.

Used command:

/home/.bin/nvsipl_camera_from_source \
    -c IMX728_96724_RGGB_CPHY_x4 \
    -m "0x1111 0x0000 0x0000 0x1101" \
    --showfps

Important observation: with nvsipl_camera which is included prebuilt into DriveOS I couldn’t reproduce it at all. I am reproducing it with nvsiple_camera built from source or custom minimal implementation you may see above. Also it seems that it really depends on optimisation flags I use when building it: with -O3 optimisations I cannot reproduce it as well, but with -O0 only. Tbh it looks like some race condition which reproduces depending on how fast client code is.

WDYT?

Does this issue block your development? If not, do you see any issue with using -03 in your project?