[Help] DeepStream 7 + nvinferserver SGIE (SCRFD face detector) crashes/doesn’t show boxes; DS resets SGIE to full‑frame and ROI routing seems ignored

I am trying to build a multi-stage inference pipeline in C++ using DeepStream 7.0. My goal is to run a PeopleNet PGIE, followed by an SCRFD Face Detector as an SGIE using nvinferserver.

However, I’m currently stuck because DeepStream is ignoring my SGIE configuration.

DeepStream logs a warning that it is resetting my SGIE from PROCESS_MODE_CLIP_OBJECTS to PROCESS_MODE_FULL_FRAME. This causes the pipeline to either crash intermittently or run without showing any face detections, even though my custom parser logs show it is finding proposals.

The exact same model (scrfd), Triton setup, and custom parser library work perfectly when I configure them as a PGIE. The issue only occurs when trying to run it as an SGIE.

Here is the warning log:

validatePluginConfig:<sgie_facenet> warning: Configuration file process_mode reset to: PROCESS_MODE_FULL_FRAME.

Setup Details:

  • Hardware Platform: Jetson Orin Nano

  • DeepStream SDK: 7.0

  • JetPack Version: 6.0

  • TensorRT Version: 8.6.2.3

  • Application: C++ (using DeepStream Service Maker, main.cpp)

  • Operating System: Ubuntu 22.04 (via JetPack 6.0)

  • CUDA Version: Bundled with JetPack 6.0

  • Triton Server: Running locally (gRPC localhost:8001)

What I’m Using:

  • PGIE Model: PeopleNet (unique-id=1, person class id=0)

  • SGIE Model: SCRFD (unique-id=3), served via Triton

  • Model Input Size: 640×640

  • Model Output Tensors: 9 outputs total (for strides 8/16/32, each has score, bbox, kps tensors)

    • Stride 8 (12800 anchors):
      • score_8 (e.g., Sigmoid output, 12800x1)

      • bbox_8 (e.g., Graph output 451, shape 12800x4)

      • kps_8 (e.g., Graph output 454, shape 12800x10)

    • Stride 16 (3200 anchors):
      • score_16 (e.g., Graph output 471, shape 3200x1)

      • bbox_16 (e.g., Graph output 474, shape 3200x4)

      • kps_16 (e.g., Graph output 477, shape 3200x10)

    • Stride 32 (800 anchors):
      • score_32 (e.g., Graph output 494, shape 800x1)

      • bbox_32 (e.g., Graph output 497, shape 800x4)

      • kps_32 (e.g., Graph output 500, shape 800x10)

  • Custom Parser: IInferCustomProcessor implementation (CreateInferServerCustomProcess) in a .so file, loaded via custom-lib-path in the nvinferserver config.

Issue Description:

I am using the C++ Service Maker application (main.cpp). My process is:

  1. I start the service.

  2. I add a PeopleNet pool as the PGIE (unique-id=1).

  3. I set the environment variable DS_ENABLE_FACENET=1. This flag in my main.cpp triggers adding the nvinferserver element as an SGIE (sgie_facenet, unique-id=3).

  4. The config_sgie_scrfd.txt file is loaded, which explicitly sets process_mode: PROCESS_MODE_CLIP_OBJECTS and operate_on_gie_id: 1.

  5. DeepStream immediately prints the warning that it has reset process_mode to FULL_FRAME.

  6. My custom parser (scrfd_custom_process.cpp) logs confirm it is not receiving the OPTION_NVDS_OBJ_META_LIST from the inOptions and is falling back to full-frame mode.

  7. This fallback is unstable, leading to intermittent crashes or no face detections being attached.

As stated, the model and parser logic are correct, as they work perfectly in PGIE mode. The issue seems to be how nvinferserver handles (or ignores) the input_control settings when used as an SGIE.

DeepStream Pipeline (main.cpp snippet)

This is the logic in my main.cpp that adds the SGIE when DS_ENABLE_FACENET=1.

C++

// ... PGIE (uid=1) and optional Tracker/Analytics are added first ...

        // Optional FaceNet SGIE (secondary) inserted after PGIE/tracker
        std::string facenet_cfg_path;
        if (enable_facenet) { // This is true
            const char* facenet_cfg_env = std::getenv("DS_FACENET_CONFIG_FILE");
            // This resolves to "config_sgie_scrfd.txt"
            facenet_cfg_path = resolve_path(facenet_cfg_env ? std::string(facenet_cfg_env) : std::string("custom_logic/config_sgie_facenet.txt"));
            
            std::cout << "[" << model_name << "] Enabling FaceNet SGIE with config: " << facenet_cfg_path << "\n";
            
            // Add the nvinferserver element for the SGIE
            p->add("nvinferserver", "sgie_facenet");
            auto sgie_elem = (*p)["sgie_facenet"];
            sgie_elem.set("config-file-path", facenet_cfg_path.c_str());
            sgie_elem.set("unique-id", 3);
            // We rely 100% on the config file for input_control (process_mode, operate_on)
        }

        // ... Pipeline linking logic follows ...

config_sgie_scrfd.txt (Gst-nvinferserver configuration)

This is the config file for the sgie_facenet element.

Ini, TOML

# Secondary GIE (SGIE) for SCRFD face detection via Triton

infer_config {
  unique_id: 3
  gpu_ids: [0]
  max_batch_size: 4

  backend {
    triton {
      model_name: "scrfd"
      version: -1
      grpc { url: "localhost:8001" }
    }
  }

  preprocess {
    network_format: IMAGE_FORMAT_BGR
    tensor_order: TENSOR_ORDER_LINEAR
    maintain_aspect_ratio: 1
    symmetric_padding: 1
    normalize { scale_factor: 1.0 }
  }

  postprocess { other {} }

  extra {
 
   # DS7 uses a misspelled key for the custom processor symbol
    custom_process_funcion: "CreateInferServerCustomProcess"
  }
  custom_lib {
    path: "/data/triton_models/dynamic_ds_service_cpp/build/libnvdsinferserver_custom_process_scrfd.so"
  }
}

input_control {
  # Run as SGIE on object crops coming from PGIE/tracker
  process_mode: PROCESS_MODE_CLIP_OBJECTS
  interval: 0

  # Operate on PGIE with unique_id=1 (PeopleNet/YOLO primary)
  operate_on_gie_id: 1
  # Operate on person class from PGIE (PeopleNet/COCO typically class 0)
  operate_on_class_ids: [0]
}

output_control {
  output_tensor_meta: false
}

scrfd_custom_process.cpp (custom parser)

This is the full custom processor code for nvinferserver. It includes the fallback logic to manually filter by PGIE ROIs when OPTION_NVDS_OBJ_META_LIST is missing.

C++

/*
 * SCRFD custom postprocess for DeepStream nvinferserver (Triton backend)
 * - Assumes 9 outputs: for strides 8/16/32, each has {score(1), bbox(4), kps(10)} tensors
 * - Generates anchors (2 per location), decodes boxes/keypoints, rescales to frame, NMS, and
 * attaches NvDsObjectMeta (class "face") to NvDsFrameMeta.
 *
 * Notes:
 * - Keep thresholds/runtime tunables here for quick iteration.
 * - For DS7, ensure bInferDone is set so nvtracker knows this is a detector frame.
 * - Landmarks: decoded but not attached as a special DS7 field (no stable landmark field). You
 * can attach them as user meta if needed later.
 */
#include <string.h>
#include <algorithm>
#include <cmath>
#include <iostream>
#include <vector>

#include "nvdsinferserver/infer_custom_process.h"
#include "nvbufsurface.h"
#include "nvdsmeta.h"

namespace dsis = nvdsinferserver;

struct Anchor { float cx, cy, w, h; };

struct Proposal {
  NvDsInferObjectDetectionInfo rect; // left, top, width, height, classId
  float score;
  float landmarks[10]; // 5 points (x,y)
};

static float iou_rect(const NvDsInferObjectDetectionInfo& a,
                      const NvDsInferObjectDetectionInfo& b) {
  float x1 = std::max(a.left, b.left);
  float y1 = std::max(a.top,  b.top);
  float x2 = std::min(a.left + a.width,  b.left + b.width);
  float y2 = std::min(a.top  + a.height, b.top  + b.height);
  float iw = std::max(0.0f, x2 - x1);
  float ih = std::max(0.0f, y2 - y1);
  float inter = iw * ih;
  float ua = a.width * a.height + b.width * b.height - inter;
  return ua > 0.0f ? inter / ua : 0.0f;
}

static std::vector<Proposal> nms(std::vector<Proposal>& boxes, float iou_thr) {
  std::vector<Proposal> out;
  if (boxes.empty()) return out;
  std::sort(boxes.begin(), boxes.end(), [](auto& x, auto& y){ return x.score > y.score; });
  std::vector<char> sup(boxes.size(), 0);
  for (size_t i = 0; i < boxes.size(); ++i) {
    if (sup[i]) continue;
    out.push_back(boxes[i]);
    for (size_t j = i + 1; j < boxes.size(); ++j) {
      if (!sup[j] && iou_rect(boxes[i].rect, boxes[j].rect) > iou_thr) sup[j] = 1;
    }
  }
  return out;
}

static void generate_anchors(int net_w, int net_h, int stride,
                             const std::vector<float>& sizes,
                             std::vector<Anchor>& anchors) {
  int fw = net_w / stride;
  int fh = net_h / stride;
  anchors.reserve(anchors.size() + (size_t)fw * fh * sizes.size());
  for (int y = 0; y < fh; ++y) {
    for (int x = 0; x < fw; ++x) {
      float cx = (x + 0.5f) * stride;
      float cy = (y + 0.5f) * stride;
      for (float s : sizes) anchors.push_back({cx, cy, s, s});
    }
  }
}

class NvInferServerCustomProcess : public dsis::IInferCustomProcessor {
public:
  ~NvInferServerCustomProcess() override = default;

  void supportInputMemType(dsis::InferMemType& type) override { type = dsis::InferMemType::kCpu; }
  bool requireInferLoop() const override { return false; }
  NvDsInferStatus extraInputProcess(const std::vector<dsis::IBatchBuffer*>&,
                                    std::vector<dsis::IBatchBuffer*>&,
                                    const dsis::IOptions*) override { return NVDSINFER_SUCCESS; }
  void notifyError(NvDsInferStatus) override {}

  NvDsInferStatus inferenceDone(const dsis::IBatchArray* outputs,
                                const dsis::IOptions* inOptions) override;
private:
  NvDsInferStatus attachObjMeta(const dsis::IOptions* inOptions,
                                const std::vector<Proposal>& props,
                                uint32_t batchIdx);

  const std::vector<std::string> kLabels = { "face" };
  // SCRFD usually uses two anchors per location; we'll keep two "priors" per cell
  // but decode boxes as distances (ltrb) from center.
  const std::vector<float> kSizesS8  = {1.0f, 1.0f};
  const std::vector<float> kSizesS16 = {1.0f, 1.0f};
  const std::vector<float> kSizesS32 = {1.0f, 1.0f};
};

NvDsInferStatus NvInferServerCustomProcess::inferenceDone(
  const dsis::IBatchArray* outputs, const dsis::IOptions* inOptions)
{
  if (!outputs || outputs->getSize() != 9) {
    std::cerr << "[scrfd] Expected 9 outputs, got "
              << (outputs ? (int)outputs->getSize() : -1) << "\n";
    return NVDSINFER_CUSTOM_LIB_FAILED;
  }

  // Determine effective batch size using metadata preference order:
  // SGIE object meta list > frame meta list > surface params list > stream ids > tensor batch size
  std::vector<uint64_t> streamIds; inOptions->getValueArray(OPTION_NVDS_SREAM_IDS, streamIds);
  std::vector<NvBufSurfaceParams*> surfParamsList; inOptions->getValueArray(OPTION_NVDS_BUF_SURFACE_PARAMS_LIST, surfParamsList);
  std::vector<NvDsFrameMeta*> frameMetaList; inOptions->getValueArray(OPTION_NVDS_FRAME_META_LIST, frameMetaList);
  std::vector<NvDsObjectMeta*> objMetaList; if (inOptions->hasValue(OPTION_NVDS_OBJ_META_LIST)) {
    inOptions->getValueArray(OPTION_NVDS_OBJ_META_LIST, objMetaList);
  }

  uint32_t B = 0;
  if (!objMetaList.empty()) B = static_cast<uint32_t>(objMetaList.size());
  else if (!frameMetaList.empty()) B = static_cast<uint32_t>(frameMetaList.size());
  else if (!surfParamsList.empty()) B = static_cast<uint32_t>(surfParamsList.size());
  else if (!streamIds.empty()) B = static_cast<uint32_t>(streamIds.size());
  else if (outputs && outputs->getSize() > 0) {
    auto* buf0 = outputs->getBuffer(0);
    if (buf0) B = buf0->getBatchSize();
  }
  if (B == 0) {
    // No frames in this callback (can happen in DS), nothing to do.
    return NVDSINFER_SUCCESS;
  }

  // Tunables
  // Per-stride confidence thresholds and candidate caps to curb over-detections.
  // If SGIE requested but receives no ROIs (DS fallback to full-frame), relax thresholds a bit.
  bool hasObjListKey = inOptions->hasValue(OPTION_NVDS_OBJ_META_LIST);
  bool sgie_no_roi = hasObjListKey && objMetaList.empty();
  int64_t stage_uid_dbg = 0; inOptions->getInt(OPTION_NVDS_UNIQUE_ID, stage_uid_dbg);
  bool is_secondary = (stage_uid_dbg != 1);
  bool sgie_fullframe = (is_secondary && !hasObjListKey);
  std::cerr << "[scrfd] mode: hasObjListKey=" << (hasObjListKey?1:0)
            << " sgie_no_roi=" << (sgie_no_roi?1:0)
            << " sgie_fullframe=" << (sgie_fullframe?1:0)
            << " B=" << B << "\n";
  float conf_thr_s[3]; // s8, s16, s32
  if (sgie_no_roi || sgie_fullframe) {
    conf_thr_s[0] = 0.45f; conf_thr_s[1] = 0.40f; conf_thr_s[2] = 0.30f; // more permissive for full-frame
  } else {
    conf_thr_s[0] = 0.75f; conf_thr_s[1] = 0.65f; conf_thr_s[2] = 0.50f; // tighter for PGIE/true ROI
  }
  const int   topk_s[3]     = {150,   100,    50};   // fewer candidates per stride before decode
  const float nms_iou       = 0.30f;                 // stricter NMS
  const float min_face      = (sgie_no_roi || sgie_fullframe) ? 16.0f : 24.0f; // allow smaller faces in full-frame sec
  const int   max_total_out = (sgie_no_roi || sgie_fullframe) ? 250 : 150; // allow a few more in fallback/full-frame sec
  const int   net_w = 640, net_h = 640; // assumed
  const int   strides[3] = {8, 16, 32};

  // Anchors
  std::vector<Anchor> A8, A16, A32;
  generate_anchors(net_w, net_h, strides[0], kSizesS8,  A8);
  generate_anchors(net_w, net_h, strides[1], kSizesS16, A16);
  generate_anchors(net_w, net_h, strides[2], kSizesS32, A32);
  const std::vector<const std::vector<Anchor>*> AGRIDS = { &A8, &A16, &A32 };

  for (uint32_t b = 0; b < B; ++b) {
    std::vector<Proposal> props;

    // Frame dimensions always refer to the original full frame (for clamping and offsets)
    bool sgie_mode = (!objMetaList.empty());
    uint32_t fidx = (!frameMetaList.empty() ? (sgie_mode ? 0u : std::min<uint32_t>(b, frameMetaList.size()-1)) : 0u);
    float frame_w = (float)net_w, frame_h = (float)net_h;
    if (b < surfParamsList.size() && surfParamsList[b]) {
      frame_w = (float)surfParamsList[b]->width;
      frame_h = (float)surfParamsList[b]->height;
    } else if (!frameMetaList.empty() && frameMetaList[fidx]) {
      // Prefer source frame dimensions from frame meta when surface params are unavailable
      frame_w = (float)frameMetaList[fidx]->source_frame_width;
      frame_h = (float)frameMetaList[fidx]->source_frame_height;
    }

    // For SGIE (clip objects), compute mapping with respect to the parent object's ROI
    bool sgie_mode_roi = (!objMetaList.empty() && b < objMetaList.size() && objMetaList[b]);
    float roi_x = 0.0f, roi_y = 0.0f, roi_w = frame_w, roi_h = frame_h;
    if (sgie_mode_roi) {
      const NvDsObjectMeta* parent = objMetaList[b];
      const NvOSD_RectParams& pr = parent->rect_params;
      roi_x = pr.left; roi_y = pr.top; roi_w = pr.width; roi_h = pr.height;
      // Clamp to frame bounds defensively
      roi_x = std::max(0.0f, std::min(roi_x, frame_w));
      roi_y = std::max(0.0f, std::min(roi_y, frame_h));
      roi_w = std::max(1.0f, std::min(roi_w, frame_w - roi_x));
      roi_h = std::max(1.0f, std::min(roi_h, frame_h - roi_y));
    }

    // Preprocess mapping (maintain_aspect_ratio & symmetric_padding) from ROI to network
    float r = std::min((float)net_w / roi_w, (float)net_h / roi_h);
    float pad_x = (net_w - roi_w * r) * 0.5f;
    float pad_y = (net_h - roi_h * r) * 0.5f;

    for (int s = 0; s < 3; ++s) {
      const int stride = strides[s];
      const auto& anchors = *AGRIDS[s];
      int N = (int)anchors.size();
      auto* out_sc = outputs->getBuffer(s*3 + 0);
      auto* out_bb = outputs->getBuffer(s*3 + 1);
      auto* out_kp = outputs->getBuffer(s*3 + 2);
      if (!out_sc || !out_bb || !out_kp) {
        std::cerr << "[scrfd] missing output buffer at scale index " << s << "\n";
        continue;
      }
      // Compute per-frame element counts from tensor dims to avoid overruns if shapes differ
      auto elems_from_dims = [](const dsis::IBatchBuffer* buf) -> int {
        auto d = buf->getBufDesc().dims;
        long long prod = 1;
        for (int i = 0; i < d.numDims; ++i) prod *= std::max(1, d.d[i]);
        if (prod <= 0 || prod > INT32_MAX) return 0;
        return (int)prod;
      };
      const int elems_sc = elems_from_dims(out_sc);
      const int elems_bb = elems_from_dims(out_bb);
      const int elems_kp = elems_from_dims(out_kp);
      int n_from_buf = std::min({ elems_sc, (elems_bb > 0 ? elems_bb/4 : 0), (elems_kp > 0 ? elems_kp/10 : 0) });
      if (n_from_buf <= 0) {
        std::cerr << "[scrfd] invalid tensor shapes at scale " << s
                  << " elems_sc=" << elems_sc << " elems_bb=" << elems_bb
                  << " elems_kp=" << elems_kp << "\n";
        continue;
      }
      if (N > n_from_buf) {
        std::cerr << "[scrfd] anchor count(" << N << ") > buffer N(" << n_from_buf
                  << ") at stride s" << stride << "; capping to prevent OOB\n";
        N = n_from_buf;
      }
      // Batchless-output tolerant indexing: prefer b when buffer reports a batch, else index 0
      auto select_index = [&](const dsis::IBatchBuffer* buf, const char* name, bool& ok) -> uint32_t {
        uint32_t bs = buf->getBatchSize();
        if (bs == 0) return 0; // treat as implicit batch-1
        if (b >= bs) {
          std::cerr << "[scrfd] batch index " << b << " out of range (" << bs << ") for " << name
                    << " at scale " << s << "\n";
          ok = false;
          return 0;
        }
        return b;
      };

      // One-time shape print for sanity (before any early-returns)
      static bool kPrintedShapes = false;
      if (!kPrintedShapes) {
        auto ds = out_sc->getBufDesc(); auto db = out_bb->getBufDesc(); auto dk = out_kp->getBufDesc();
        auto pd = [&](const char* tag, const dsis::IBatchBuffer* buf, const auto& d){
          std::cerr << "[scrfd] tensor " << tag << " dims=" << d.dims.numDims << " [";
          for (int i=0;i<d.dims.numDims;++i){ std::cerr << d.dims.d[i] << (i+1<d.dims.numDims?",":""); }
          std::cerr << "] batchReported=" << buf->getBatchSize() << "\n"; };
        pd("score", out_sc, ds); pd("bbox", out_bb, db); pd("kps", out_kp, dk);
        kPrintedShapes = true;
      }

      bool idx_ok = true;
      uint32_t idx_sc = select_index(out_sc, "score", idx_ok);
      uint32_t idx_bb = select_index(out_bb, "bbox", idx_ok);
      uint32_t idx_kp = select_index(out_kp, "kps", idx_ok);
      if (!idx_ok) continue;

      const float* scores_base = static_cast<const float*>(out_sc->getBufPtr(idx_sc));
      const float* bboxes_base = static_cast<const float*>(out_bb->getBufPtr(idx_bb));
      const float* kpses_base  = static_cast<const float*>(out_kp->getBufPtr(idx_kp));
      if (!scores_base || !bboxes_base || !kpses_base) {
        std::cerr << "[scrfd] null tensor ptr(s) at scale " << s << "\n";
        continue;
      }
      // Adjust pointers for implicit-batch layout (concatenated per-frame) when batchReported==0
      const uint32_t bs_sc = out_sc->getBatchSize();
      const uint32_t bs_bb = out_bb->getBatchSize();
      const uint32_t bs_kp = out_kp->getBatchSize();
  const int num_cells = n_from_buf; // anchors per frame at this stride from buffer
  const int step_sc = elems_sc;     // scores per frame elements
  const int step_bb = elems_bb;     // bbox floats per frame elements
  const int step_kp = elems_kp;     // kps floats per frame elements
      const float* scores = scores_base + ((bs_sc == 0) ? (int)b * step_sc : 0);
      const float* bboxes = bboxes_base + ((bs_bb == 0) ? (int)b * step_bb : 0);
      const float* kpses  = kpses_base  + ((bs_kp == 0) ? (int)b * step_kp : 0);

      // Debug: summarize score distribution for this stride once per frame
      float max_sc = 0.0f; int cnt_gt_01 = 0, cnt_gt_thr = 0;
      for (int i = 0; i < N; ++i) {
        float sc = scores[i];
        if (sc > max_sc) max_sc = sc;
        if (sc > 0.10f) ++cnt_gt_01;
        if (sc > conf_thr_s[s]) ++cnt_gt_thr;
      }
      std::cerr << "[scrfd] b=" << b << " s" << stride
                << " N=" << N
                << " max_sc=" << max_sc
                << " gt0.1=" << cnt_gt_01
                << " gt_thr=" << cnt_gt_thr << "\n";

      // Preselect top-K candidates above stride-specific threshold
      std::vector<std::pair<float,int>> cand;
      cand.reserve(std::min(N, topk_s[s]));
      const float thr = conf_thr_s[s];
      for (int i = 0; i < N; ++i) {
        float score = scores[i];
        if (score >= thr) cand.emplace_back(score, i);
      }
      // If none meet the threshold in fallback mode, still take the top-K by score to allow detections
      if (cand.empty() && (sgie_no_roi || sgie_fullframe)) {
        cand.reserve(std::min(N, topk_s[s]));
        for (int i = 0; i < N; ++i) cand.emplace_back(scores[i], i);
      }
      if ((int)cand.size() > topk_s[s]) {
        std::partial_sort(cand.begin(), cand.begin() + topk_s[s], cand.end(),
                          [](const auto& x, const auto& y){ return x.first > y.first; });
        cand.resize(topk_s[s]);
      } else {
        std::sort(cand.begin(), cand.end(), [](const auto& x, const auto& y){ return x.first > y.first; });
      }

      for (const auto& kv : cand) {
        int i = kv.second;
        float score = kv.first;
        const Anchor& a = anchors[i];

  const float* bd = bboxes + i*4;
  // Decode as distances from center (ltrb)
  float left   = a.cx - bd[0] * stride;
  float top    = a.cy - bd[1] * stride;
  float right  = a.cx + bd[2] * stride;
  float bottom = a.cy + bd[3] * stride;
  float w = std::max(0.0f, right - left);
  float h = std::max(0.0f, bottom - top);

        const float* kd = kpses + i*10;
        float lm[10];
        for (int k = 0; k < 5; ++k) {
          lm[k*2]   = a.cx + kd[k*2]   * stride;
          lm[k*2+1] = a.cy + kd[k*2+1] * stride;
        }

        Proposal p{}; p.score = score; p.rect.classId = 0;
        // Rescale to ROI coordinates and then offset into full-frame coordinates
        float rl = (left - pad_x) / r;
        float rt = (top  - pad_y) / r;
        float rw = w / r;
        float rh = h / r;
        // Map into full-frame
        float fl = roi_x + rl;
        float ft = roi_y + rt;
        p.rect.left   = std::max(0.0f, fl);
        p.rect.top    = std::max(0.0f, ft);
        p.rect.width  = std::min(frame_w - p.rect.left, rw);
        p.rect.height = std::min(frame_h - p.rect.top,  rh);
        for (int k = 0; k < 5; ++k) {
          float lx = roi_x + (lm[k*2]   - pad_x) / r;
          float ly = roi_y + (lm[k*2+1] - pad_y) / r;
          p.landmarks[k*2]   = std::min(frame_w, std::max(0.0f, lx));
          p.landmarks[k*2+1] = std::min(frame_h, std::max(0.0f, ly));
        }
        if (p.rect.width >= min_face && p.rect.height >= min_face) props.emplace_back(p);
      }
    }

  std::cerr << "[scrfd] b=" << b << " props_pre_nms=" << props.size() << "\n";
  auto finals = nms(props, nms_iou);
  if ((int)finals.size() > max_total_out) {
    std::partial_sort(finals.begin(), finals.begin() + max_total_out, finals.end(),
                      [](const Proposal& a, const Proposal& b){ return a.score > b.score; });
    finals.resize(max_total_out);
  }
  std::cerr << "[scrfd] b=" << b << " props_post_nms=" << finals.size() << "\n";
    if (attachObjMeta(inOptions, finals, b) != NVDSINFER_SUCCESS) {
      std::cerr << "[scrfd] attachObjMeta failed for batch index " << b << "\n";
      return NVDSINFER_CUSTOM_LIB_FAILED;
    }
  }
  return NVDSINFER_SUCCESS;
}

NvDsInferStatus NvInferServerCustomProcess::attachObjMeta(
  const dsis::IOptions* inOptions,
  const std::vector<Proposal>& props,
  uint32_t batchIdx)
{
  NvDsBatchMeta* batchMeta = nullptr;
  if (!inOptions->hasValue(OPTION_NVDS_BATCH_META) ||
      inOptions->getObj(OPTION_NVDS_BATCH_META, batchMeta) != NVDSINFER_SUCCESS ||
      !batchMeta) {
    return NVDSINFER_CUSTOM_LIB_FAILED;
  }
  std::vector<NvDsFrameMeta*> frameMetaList;
  inOptions->getValueArray(OPTION_NVDS_FRAME_META_LIST, frameMetaList);
  std::vector<NvDsObjectMeta*> objMetaList;
  if (inOptions->hasValue(OPTION_NVDS_OBJ_META_LIST)) {
    inOptions->getValueArray(OPTION_NVDS_OBJ_META_LIST, objMetaList);
  }

  int64_t unique_id = 0; inOptions->getInt(OPTION_NVDS_UNIQUE_ID, unique_id);
  // Resolve target frame meta robustly for both PGIE (full-frame) and SGIE (clip-objects)
  NvDsFrameMeta* frameMeta = nullptr;
  // For PGIE and SGIE, frameMetaList is expected to be aligned with the batch indices
  if (batchIdx < frameMetaList.size()) frameMeta = frameMetaList[batchIdx];
  if (!frameMeta) {
    std::cerr << "[scrfd] attachObjMeta: missing frameMeta for batchIdx=" << batchIdx
              << " (frames=" << frameMetaList.size() << ", objs=" << objMetaList.size() << ")\n";
    return NVDSINFER_CUSTOM_LIB_FAILED;
  }

  // Decide if this invocation is secondary (SGIE) vs primary (PGIE).
  // Heuristics:
  // - DS sets OPTION_NVDS_OBJ_META_LIST for SGIE clip-objects calls (key present; may be empty)
  // - PGIE in this pipeline uses unique_id == 1
  // Only treat "zero ROIs" as an SGIE condition if OPTION_NVDS_OBJ_META_LIST is actually present.
  const bool hasObjListKey = inOptions->hasValue(OPTION_NVDS_OBJ_META_LIST);
  int64_t stage_uid = 0; inOptions->getInt(OPTION_NVDS_UNIQUE_ID, stage_uid);
  const bool likelyPGIE = (stage_uid == 1);
  const bool isSecondary = (!likelyPGIE);
  const bool sgie_mode = (hasObjListKey && !objMetaList.empty());
  const bool sgie_fullframe = (isSecondary && !hasObjListKey);
  if (hasObjListKey && objMetaList.empty()) {
    // Many DS7 builds reset SGIE to full-frame if clip-objects cannot be honored; in that case,
    // DeepStream still sets the OBJ_META_LIST key but leaves it empty. Proceed as full-frame to
    // avoid silently dropping detections.
    std::cerr << "[scrfd] attachObjMeta: SGIE zero ROIs (uid=" << stage_uid
              << ") ; FALLBACK to full-frame attach (default)\n";
  }
  // If secondary full-frame (key absent), emulate operate-on by filtering proposals to parent ROIs
  std::vector<Proposal> filtered_props;
  filtered_props.reserve(props.size());
  if (sgie_fullframe && frameMeta) {
    // Collect person ROIs from PGIE (class_id==0)
    std::vector<NvOSD_RectParams> parent_rois;
    for (NvDsMetaList* l = frameMeta->obj_meta_list; l != nullptr; l = l->next) {
      NvDsObjectMeta* om = (NvDsObjectMeta*)l->data;
      if (!om) continue;
      if (om->class_id == 0) parent_rois.push_back(om->rect_params);
    }
    if (!parent_rois.empty()) {
      for (const auto& p : props) {
        float cx = p.rect.left + 0.5f * p.rect.width;
        float cy = p.rect.top  + 0.5f * p.rect.height;
        bool inside = false;
        for (const auto& pr : parent_rois) {
          if (cx >= pr.left && cx <= pr.left + pr.width &&
              cy >= pr.top  && cy <= pr.top  + pr.height) { inside = true; break; }
        }
        if (inside) filtered_props.push_back(p);
      }
      std::cerr << "[scrfd] attachObjMeta: filtered by PGIE ROIs: in=" << props.size()
                << " out=" << filtered_props.size() << "\n";
    }
  }
  const std::vector<Proposal>& use_props = (!filtered_props.empty() ? filtered_props : props);
  if (frameMetaList.empty() || (!frameMetaList[0])) {
    std::cerr << "[scrfd] attachObjMeta: no valid frameMeta available; skipping attach (sgie="
              << (sgie_mode?1:0) << ")\n";
    return NVDSINFER_SUCCESS;
  }
  for (const auto& p : use_props) {
    NvDsObjectMeta* om = nvds_acquire_obj_meta_from_pool(batchMeta);
    om->unique_component_id = unique_id;
    om->confidence = p.score;
    om->object_id = UNTRACKED_OBJECT_ID;
    om->class_id = p.rect.classId; // 0 => face

    NvOSD_RectParams& r = om->rect_params;
    r.left = p.rect.left; r.top = p.rect.top;
    r.width = p.rect.width; r.height = p.rect.height;
    r.border_width = 2; r.has_bg_color = 0;
    r.border_color = (NvOSD_ColorParams){0, 1, 0, 1};

    // Minimal text/label to avoid any allocation/free issues inside OSD
    NvOSD_TextParams& t = om->text_params;
    om->obj_label[0] = '\0';
    t.display_text = nullptr;
    t.font_params.font_name = nullptr;
    t.font_params.font_size = 0;
    t.set_bg_clr = 0;
    t.x_offset = 0; t.y_offset = 0;

  // Ensure no mask meta is attached (no-op for this DS version)

    // Important meta
    om->detector_bbox_info.org_bbox_coords.left   = r.left;
    om->detector_bbox_info.org_bbox_coords.top    = r.top;
    om->detector_bbox_info.org_bbox_coords.width  = r.width;
    om->detector_bbox_info.org_bbox_coords.height = r.height;

    // Validate rect to avoid downstream crashes
    if (!(std::isfinite(r.left) && std::isfinite(r.top) && std::isfinite(r.width) && std::isfinite(r.height)) ||
        r.width <= 0.0f || r.height <= 0.0f ||
        r.left >= frameMeta->source_frame_width || r.top >= frameMeta->source_frame_height) {
      // Skip invalid rect; best-effort: do not attach; DS will reclaim meta when frame ends
      continue;
    }

    // Choose frame meta: SGIE uses first (single) frame; PGIE uses batch-aligned frame
    NvDsFrameMeta* tgtFrame = frameMeta;
    if (sgie_mode && !frameMetaList.empty() && frameMetaList[0]) {
      tgtFrame = frameMetaList[0];
    } else if (!sgie_mode && batchIdx < frameMetaList.size() && frameMetaList[batchIdx]) {
      tgtFrame = frameMetaList[batchIdx];
    }
    // Final clamp to frame bounds (integer-safe) before attach
    r.left   = std::max(0.0f, std::min(r.left,   (float)tgtFrame->source_frame_width  - 1.0f));
    r.top    = std::max(0.0f, std::min(r.top,    (float)tgtFrame->source_frame_height - 1.0f));
    r.width  = std::max(1.0f, std::min(r.width,  (float)tgtFrame->source_frame_width  - r.left));
    r.height = std::max(1.0f, std::min(r.height, (float)tgtFrame->source_frame_height - r.top));

  std::cerr << "[scrfd] attach: uid=" << stage_uid << " sgie=" << (sgie_mode?1:0)
              << " frame=" << (void*)tgtFrame
              << " rect=[" << r.left << "," << r.top << "," << r.width << "," << r.height << "]"
              << " conf=" << om->confidence << "\n";
    // Attach to frame (no parent) under meta lock for maximum stability
  nvds_acquire_meta_lock(batchMeta);
  nvds_add_obj_meta_to_frame(tgtFrame, om, nullptr);
  nvds_release_meta_lock(batchMeta);
    std::cerr << "[scrfd] attach: ok\n";
    if (stage_uid == 1) {
      tgtFrame->bInferDone = TRUE; // Only mark PGIE (uid=1) as detector frame
    }
  }
  return NVDSINFER_SUCCESS;
}

extern "C" dsis::IInferCustomProcessor* CreateInferServerCustomProcess(
  const char* /*config*/, uint32_t /*configLen*/) {
  std::cerr << "[scrfd] CreateInferServerCustomProcess() called\n";
  return new NvInferServerCustomProcess();
}

What I Have Tried:

  • Hardening the parser: My scrfd_custom_process.cpp is already built to handle the FULL_FRAME fallback. It manually fetches the PGIE person ROIs from frameMeta->obj_meta_list and filters the face detections. This is unstable and often crashes or fails to attach.

  • Safe Meta Attachment: Using nvds_acquire_meta_lock / nvds_release_meta_lock in the parser when calling nvds_add_obj_meta_to_frame.

  • Validating Rects: Clamping all final coordinates to be within the frame dimensions before attaching metadata.

  • g_object_set: I previously tried to set process-mode and operate-on-gie-id via g_object_set on the nvinferserver element in C++, but DeepStream warned these properties are not supported (which is why I am relying 100% on the config file).

Question:

  1. What is the correct way to make nvinferserver respect PROCESS_MODE_CLIP_OBJECTS when used as an SGIE? Why is it being reset to FULL_FRAME?

  2. Is this a known limitation in DeepStream 7.0 for nvinferserver (compared to the standard nvinfer plugin)?

  3. Under what conditions should my IInferCustomProcessor receive the OPTION_NVDS_OBJ_META_LIST? Is its absence expected once process_mode is reset?

  4. Given the crashes, is my fallback logic or metadata attachment (nvds_add_obj_meta_to_frame) incorrect for an SGIE? Should I be attaching to a parent object instead of the frame?

Please guide me on how to properly configure this nvinferserver SGIE!

Any suggestions or examples will be greatly appreciated.

Thank you!

  1. Please refer to this ready-made nvinferserver back-to-back-detectors sample and /opt/nvidia/deepstream/deepstream/service-maker/sources/apps/cpp/deepstream_test2_app. did you set process-mode property in the code? if still can’t work, please make surefacenet_cfg_path exits. could you share a complete DeepStream log? Since nvinferserver plugin in opensource, you can add log to check if the process-mode is set to PROCESS_MODE_CLIP_OBJECTS.
  2. No, please refer to the native sample deepstream_test2_app, or can you use test2 to reproduce this issue?
  3. From the log, the Face Detector worked as sgie. maybe it can’t detect any face on the full frame.

Thank you for your reply and for the suggestions.

I have looked at the deepstream_test2_app example. My application is built on the C++ Service Maker pattern (similar to deepstream-test2-app but using the pipeline.hpp wrapper), which is why I’m adding elements like sgie_facenet in my main.cpp.

To answer your specific questions:

  1. Did you set process-mode property in the code?

    • I have not set process-mode using g_object_set in my C++ code. When I tried this previously, nvinferserver printed a warning that process-mode was not a supported property.

    • Instead, I am setting process-mode: PROCESS_MODE_CLIP_OBJECTS inside my config_sgie_scrfd.txt file, which is passed to the nvinferserver element.

  2. Does facenet_cfg_path exist?

    • Yes, the path is correct. The log I’ve attached below confirms that the file is found and loaded:

      [yolo] Enabling FaceNet SGIE with config: /data/triton_models/dynamic_ds_service_cpp/custom_logic/config_sgie_scrfd.txt

Full Log and New Findings:

As you requested, here is the complete log from startup to the crash. I’ve added my custom parser logs ([scrfd] ...)

Bash

GST_DEBUG=nvtracker:5 DS_ENABLE_TRACKER=1 DS_TRACKER_LL_CONFIG_FILE=/data/triton_models/dynamic_ds_service_cpp/configs/tracker_yolo_nvdcf_perf.yml DS_DEBUG_COUNTS=1 DS_ENABLE_FACENET=1 DS_FACENET_CONFIG_FILE=/data/triton_models/dynamic_ds_service_cpp/custom_logic/config_sgie_scrfd.txt  ./dynamic_ds_service

Starting C++ Service Manager API on http://0.0.0.0:9000
[yolo] Initializing pipeline pool on port 9001...
Initializing GStreamer Backend...!
Add Element ... source_bin
Add Element ... pgie
[yolo] Using tracker ll-lib-file: /opt/nvidia/deepstream/deepstream/lib/libnvds_nvmultiobjecttracker.so
[yolo] Using tracker ll-config-file: /data/triton_models/dynamic_ds_service_cpp/configs/tracker_yolo_nvdcf_perf.yml
[yolo] Enabling tracker + analytics.
Add Element ... tracker
[yolo] Enabling FaceNet SGIE with config: /data/triton_models/dynamic_ds_service_cpp/custom_logic/config_sgie_scrfd.txt
Add Element ... sgie_facenet
[yolo] Configuring for VISUAL output.
Add Element ... tiler
Add Element ... osd
Add Element ... sink
LINKING: source_bin -> pgie
LINKING: pgie -> tracker
LINKING: tracker -> sgie_facenet
LINKING: sgie_facenet -> tiler
LINKING: tiler -> osd
LINKING: osd -> sink
[roi_policy] Attached to element 'tracker'.

Using winsys: x11 
Civetweb version: v1.16
Server running at port: 9001
INFO: TritonGrpcBackend id:3 initialized for model: scrfd
[scrfd] CreateInferServerCustomProcess() called
0:00:00.853985450 3314625 0xffff380114a0 DEBUG           nvtracker gstnvtracker.cpp:211:gst_nv_tracker_start:<tracker> gstnvtracker: numStreams set as 0...
0:00:00.854391096 3314625 0xffff380114a0 DEBUG           nvtracker gstnvtracker.cpp:222:gst_nv_tracker_start:<tracker> gstnvtracker: batchSize set as 8...
gstnvtracker: Loading low-level lib at /opt/nvidia/deepstream/deepstream/lib/libnvds_nvmultiobjecttracker.so
[NvTrackerParams::getConfigRoot()] !!![WARNING] Can't open config file (/data/triton_models/dynamic_ds_service_cpp/configs/tracker_yolo_nvdcf_perf.yml). Will go ahead with default values
[NvMultiObjectTracker] Initialized
INFO: TritonGrpcBackend id:1 initialized for model: yolov11
Event Thread Enabled...
Main Loop Running...
uri:/api/v1/stream/add
method:POST
new stream added [0:cam-05:]

(dynamic_ds_service:3314625): GLib-GObject-WARNING **: 18:35:04.161: g_object_set_is_valid_property: object class 'nvv4l2decoder' has no property named 'low-latency-mode'
... (GStreamer warnings) ...
INFO from tiler: Square Grid property enabled, ignoring nvmultistreamtiler rows/columns
... (GStreamer warnings) ...
0:00:06.434915968 3314625 0xfffe2848d400 DEBUG           nvtracker gstnvtracker.cpp:159:gst_nv_tracker_sink_event:<tracker> Pad added 0
... (roi_policy logs) ...

# === THIS IS THE KEY PART ===
[scrfd] mode: hasObjListKey=0 sgie_no_roi=0 sgie_fullframe=1 B=1
[scrfd] tensor score dims=2 [12800,1] batchReported=0
[scrfd] tensor bbox dims=2 [12800,4] batchReported=0
[scrfd] tensor kps dims=2 [12800,10] batchReported=0
[scrfd] b=0 s8 N=12800 max_sc=0.685831 gt0.1=634 gt_thr=28
[scrfd] b=0 s16 N=3200 max_sc=0.214186 gt0.1=14 gt_thr=0
[scrfd] b=0 s32 N=800 max_sc=0.0423399 gt0.1=0 gt_thr=0
[scrfd] b=0 props_pre_nms=129
[scrfd] b=0 props_post_nms=33
Segmentation fault (core dumped)

The log shows two things:

  1. My custom parser log [scrfd] mode: hasObjListKey=0 sgie_no_roi=0 sgie_fullframe=1 suggest that nvinferserver is ignoring my config and running in full-frame mode (not clip-objects). This confirms my original problem.

  2. You suggested “maybe it can’t detect any face on the full frame.” But I don’t think this is the case. My parser is successfully finding faces (props_post_nms=33). The problem is that immediately after it finds these 33 objects, the application crashes with a Segmentation fault (core dumped).

This suggests the problem is likely in my parser’s attachObjMeta function. It seems that when nvinferserver is in this “full-frame” fallback mode, the metadata (like NvDsBatchMeta* or NvDsFrameMeta*) passed to my inferenceDone function is not what my parser expects, causing a crash when I try to attach the 33 detected objects.

My questions are:

  1. Why is nvinferserver ignoring the process_mode in the config file?

  2. Given this crash, is there a known issue with metadata handling in the IInferCustomProcessor when nvinferserver runs as an SGIE in full-frame mode?

Thank you for your help.

  1. From th log in your last comment, there is no “warning: Configuration file process_mode reset to: PROCESS_MODE_FULL_FRAME” printing. How did you get sgie_fullframe? you can add log in GstNvInferServerImpl::processBatchMeta to check if processObjects is called.
  2. If the Face Detector model detect faces on the full frame, you can add object meta to frame meta. nvinferersever also supports sgie in full-frame mode. could you use gdb to get a crash stack? could you simplify th code to narrow down this isuse? for exmple, if emptying attachObjMeta() or inferenceDone(), will the crash issue persist? could you check which code line causes the crash?

Thank you for the suggestions! Here’s what I’ve found:

1. processObjects is NOT being called for SGIE

We added extensive logging in our custom processor and confirmed:

  • OPTION_NVDS_OBJ_META_LIST key is absent (hasObjListKey=0) when the SGIE runs.

  • This indicates nvinferserver is NOT calling our processor in clip-objects mode, despite our config setting:

    Ini, TOML

    input_control {
      process_mode: PROCESS_MODE_CLIP_OBJECTS
      operate_on_gie_id: 1
      operate_on_class_ids: [0]
    }
    
    
  • Instead, our logs confirm it’s running in full-frame mode (sgie_fullframe=1).

2. Crash location

We isolated the crash by:

  • a) Emptying inferenceDone(): No crash - pipeline runs fine.

  • b) Returning early from attachObjMeta(): No crash - faces are decoded but not attached.

  • Calling nvds_add_obj_meta_to_frame() in SGIE full-frame context: ❌ Immediate segfault.

3. Exact crash line identified

The crash occurs at this line in our custom processor:

C++

nvds_add_obj_meta_to_frame(tgtFrame, om, parent_obj);

Crash happens in 3 scenarios:

  1. parent_obj = nullptr (null parent)

  2. parent_obj = <valid pointer> (matched parent “person” object from batch meta)

  3. Only works when NOT called from SGIE context (PGIE mode works fine)

4. Log at Point of Crash:

This is our stdout log immediately before the crash.

[scrfd] attach: uid=3 sgie=0 isSec=1 frame=0xfffe4c04dca0 parent=0xfffe4c04dca0 rect=[1049.83,299.372,453.461,562.687] conf=0.0221886
Segmentation fault (core dumped)

The crash happens immediately after our log line shows the attach parameters, confirming it’s inside nvds_add_obj_meta_to_frame() itself.

5. Questions

  1. Why does Deepstream appear to ingore my config?

  2. Is calling nvds_add_obj_meta_to_frame() from a custom processor in SGIE full-frame mode supported? It crashes regardless of the parent pointer value.

  3. Should we use a different API to attach detector objects from SGIE custom processors? Or is this workflow fundamentally unsupported?

6. Workaround attempt

Since processObjects isn’t being called, we tried manually gathering parent “person” objects from batchMeta->frame_meta_list and matching faces by containment—but nvds_add_obj_meta_to_frame() still crashes.

Here is the full code for the custom parser we are using, which includes the debug flags (SCRFD_NO_ATTACH, etc.) and the logic to find parent objects.

Full scrfd_custom_process.cpp (The code we are using)

/*
 * SCRFD custom postprocess for DeepStream nvinferserver (Triton backend)
 * - Assumes 9 outputs: for strides 8/16/32, each has {score(1), bbox(4), kps(10)} tensors
 * - Generates anchors (2 per location), decodes boxes/keypoints, rescales to frame, NMS, and
 * attaches NvDsObjectMeta (class "face") to NvDsFrameMeta.
 *
 * Notes:
 * - Keep thresholds/runtime tunables here for quick iteration.
 * - For DS7, ensure bInferDone is set so nvtracker knows this is a detector frame.
 * - Landmarks: decoded but not attached as a special DS7 field (no stable landmark field). You
 * can attach them as user meta if needed later.
 */
#include <string.h>
#include <algorithm>
#include <cmath>
#include <iostream>
#include <vector>

#include "nvdsinferserver/infer_custom_process.h"
#include "nvbufsurface.h"
#include "nvdsmeta.h"

namespace dsis = nvdsinferserver;

struct Anchor { float cx, cy, w, h; };

struct Proposal {
  NvDsInferObjectDetectionInfo rect; // left, top, width, height, classId
  float score;
  float landmarks[10]; // 5 points (x,y)
};

static float iou_rect(const NvDsInferObjectDetectionInfo& a,
                      const NvDsInferObjectDetectionInfo& b) {
  float x1 = std::max(a.left, b.left);
  float y1 = std::max(a.top,  b.top);
  float x2 = std::min(a.left + a.width,  b.left + b.width);
  float y2 = std::min(a.top  + a.height, b.top  + b.height);
  float iw = std::max(0.0f, x2 - x1);
  float ih = std::max(0.0f, y2 - y1);
  float inter = iw * ih;
  float ua = a.width * a.height + b.width * b.height - inter;
  return ua > 0.0f ? inter / ua : 0.0f;
}

static std::vector<Proposal> nms(std::vector<Proposal>& boxes, float iou_thr) {
  std::vector<Proposal> out;
  if (boxes.empty()) return out;
  std::sort(boxes.begin(), boxes.end(), [](auto& x, auto& y){ return x.score > y.score; });
  std::vector<char> sup(boxes.size(), 0);
  for (size_t i = 0; i < boxes.size(); ++i) {
    if (sup[i]) continue;
    out.push_back(boxes[i]);
    for (size_t j = i + 1; j < boxes.size(); ++j) {
      if (!sup[j] && iou_rect(boxes[i].rect, boxes[j].rect) > iou_thr) sup[j] = 1;
    }
  }
  return out;
}

static void generate_anchors(int net_w, int net_h, int stride,
                             const std::vector<float>& sizes,
                             std::vector<Anchor>& anchors) {
  int fw = net_w / stride;
  int fh = net_h / stride;
  anchors.reserve(anchors.size() + (size_t)fw * fh * sizes.size());
  for (int y = 0; y < fh; ++y) {
    for (int x = 0; x < fw; ++x) {
      float cx = (x + 0.5f) * stride;
      float cy = (y + 0.5f) * stride;
      for (float s : sizes) anchors.push_back({cx, cy, s, s});
    }
  }
}

class NvInferServerCustomProcess : public dsis::IInferCustomProcessor {
public:
  ~NvInferServerCustomProcess() override = default;

  void supportInputMemType(dsis::InferMemType& type) override { type = dsis::InferMemType::kCpu; }
  bool requireInferLoop() const override { return false; }
  NvDsInferStatus extraInputProcess(const std::vector<dsis::IBatchBuffer*>&,
                                    std::vector<dsis::IBatchBuffer*>&,
                                    const dsis::IOptions*) override { return NVDSINFER_SUCCESS; }
  void notifyError(NvDsInferStatus) override {}

  NvDsInferStatus inferenceDone(const dsis::IBatchArray* outputs,
                                const dsis::IOptions* inOptions) override;
private:
  NvDsInferStatus attachObjMeta(const dsis::IOptions* inOptions,
                                const std::vector<Proposal>& props,
                                uint32_t batchIdx);

  const std::vector<std::string> kLabels = { "face" };
  // SCRFD usually uses two anchors per location; we'll keep two "priors" per cell
  // but decode boxes as distances (ltrb) from center.
  const std::vector<float> kSizesS8  = {1.0f, 1.0f};
  const std::vector<float> kSizesS16 = {1.0f, 1.0f};
  const std::vector<float> kSizesS32 = {1.0f, 1.0f};
};

NvDsInferStatus NvInferServerCustomProcess::inferenceDone(
  const dsis::IBatchArray* outputs, const dsis::IOptions* inOptions)
{
  // Allow early exit to isolate crashes per moderator guidance
  if (const char* early = std::getenv("SCRFD_EARLY_EXIT")) {
    if (*early) {
      std::cerr << "[scrfd] SCRFD_EARLY_EXIT=1 set; skipping decode/attach.\n";
      return NVDSINFER_SUCCESS;
    }
  }
  if (!outputs || outputs->getSize() != 9) {
    std::cerr << "[scrfd] Expected 9 outputs, got "
              << (outputs ? (int)outputs->getSize() : -1) << "\n";
    return NVDSINFER_CUSTOM_LIB_FAILED;
  }

  // Determine effective batch size using metadata preference order:
  // SGIE object meta list > frame meta list > surface params list > stream ids > tensor batch size
  std::vector<uint64_t> streamIds; inOptions->getValueArray(OPTION_NVDS_SREAM_IDS, streamIds);
  std::vector<NvBufSurfaceParams*> surfParamsList; inOptions->getValueArray(OPTION_NVDS_BUF_SURFACE_PARAMS_LIST, surfParamsList);
  std::vector<NvDsFrameMeta*> frameMetaList; inOptions->getValueArray(OPTION_NVDS_FRAME_META_LIST, frameMetaList);
  std::vector<NvDsObjectMeta*> objMetaList; if (inOptions->hasValue(OPTION_NVDS_OBJ_META_LIST)) {
    inOptions->getValueArray(OPTION_NVDS_OBJ_META_LIST, objMetaList);
  }

  uint32_t B = 0;
  if (!objMetaList.empty()) B = static_cast<uint32_t>(objMetaList.size());
  else if (!frameMetaList.empty()) B = static_cast<uint32_t>(frameMetaList.size());
  else if (!surfParamsList.empty()) B = static_cast<uint32_t>(surfParamsList.size());
  else if (!streamIds.empty()) B = static_cast<uint32_t>(streamIds.size());
  else if (outputs && outputs->getSize() > 0) {
    auto* buf0 = outputs->getBuffer(0);
    if (buf0) B = buf0->getBatchSize();
  }
  if (B == 0) {
    // No frames in this callback (can happen in DS), nothing to do.
    return NVDSINFER_SUCCESS;
  }

  // Tunables
  // Per-stride confidence thresholds and candidate caps to curb over-detections.
  // If SGIE requested but receives no ROIs (DS fallback to full-frame), relax thresholds a bit.
  bool hasObjListKey = inOptions->hasValue(OPTION_NVDS_OBJ_META_LIST);
  bool sgie_no_roi = hasObjListKey && objMetaList.empty();
  int64_t stage_uid_dbg = 0; inOptions->getInt(OPTION_NVDS_UNIQUE_ID, stage_uid_dbg);
  bool is_secondary = (stage_uid_dbg != 1);
  bool sgie_fullframe = (is_secondary && !hasObjListKey);
  std::cerr << "[scrfd] mode: hasObjListKey=" << (hasObjListKey?1:0)
            << " sgie_no_roi=" << (sgie_no_roi?1:0)
            << " sgie_fullframe=" << (sgie_fullframe?1:0)
            << " B=" << B << "\n";
  float conf_thr_s[3]; // s8, s16, s32
  if (sgie_no_roi || sgie_fullframe) {
    conf_thr_s[0] = 0.45f; conf_thr_s[1] = 0.40f; conf_thr_s[2] = 0.30f; // more permissive for full-frame
  } else {
    conf_thr_s[0] = 0.75f; conf_thr_s[1] = 0.65f; conf_thr_s[2] = 0.50f; // tighter for PGIE/true ROI
  }
  const int   topk_s[3]     = {150,   100,    50};   // fewer candidates per stride before decode
  const float nms_iou       = 0.30f;                 // stricter NMS
  const float min_face      = (sgie_no_roi || sgie_fullframe) ? 16.0f : 24.0f; // allow smaller faces in full-frame sec
  const int   max_total_out = (sgie_no_roi || sgie_fullframe) ? 250 : 150; // allow a few more in fallback/full-frame sec
  const int   net_w = 640, net_h = 640; // assumed
  const int   strides[3] = {8, 16, 32};

  // Anchors
  std::vector<Anchor> A8, A16, A32;
  generate_anchors(net_w, net_h, strides[0], kSizesS8,  A8);
  generate_anchors(net_w, net_h, strides[1], kSizesS16, A16);
  generate_anchors(net_w, net_h, strides[2], kSizesS32, A32);
  const std::vector<const std::vector<Anchor>*> AGRIDS = { &A8, &A16, &A32 };

  for (uint32_t b = 0; b < B; ++b) {
    std::vector<Proposal> props;

    // Frame dimensions always refer to the original full frame (for clamping and offsets)
    bool sgie_mode = (!objMetaList.empty());
    uint32_t fidx = (!frameMetaList.empty() ? (sgie_mode ? 0u : std::min<uint32_t>(b, frameMetaList.size()-1)) : 0u);
    float frame_w = (float)net_w, frame_h = (float)net_h;
    if (b < surfParamsList.size() && surfParamsList[b]) {
      frame_w = (float)surfParamsList[b]->width;
      frame_h = (float)surfParamsList[b]->height;
    } else if (!frameMetaList.empty() && frameMetaList[fidx]) {
      // Prefer source frame dimensions from frame meta when surface params are unavailable
      frame_w = (float)frameMetaList[fidx]->source_frame_width;
      frame_h = (float)frameMetaList[fidx]->source_frame_height;
    }

    // For SGIE (clip objects), compute mapping with respect to the parent object's ROI
    bool sgie_mode_roi = (!objMetaList.empty() && b < objMetaList.size() && objMetaList[b]);
    float roi_x = 0.0f, roi_y = 0.0f, roi_w = frame_w, roi_h = frame_h;
    if (sgie_mode_roi) {
      const NvDsObjectMeta* parent = objMetaList[b];
      const NvOSD_RectParams& pr = parent->rect_params;
      roi_x = pr.left; roi_y = pr.top; roi_w = pr.width; roi_h = pr.height;
      // Clamp to frame bounds defensively
      roi_x = std::max(0.0f, std::min(roi_x, frame_w));
      roi_y = std::max(0.0f, std::min(roi_y, frame_h));
      roi_w = std::max(1.0f, std::min(roi_w, frame_w - roi_x));
      roi_h = std::max(1.0f, std::min(roi_h, frame_h - roi_y));
    }

    // Preprocess mapping (maintain_aspect_ratio & symmetric_padding) from ROI to network
    float r = std::min((float)net_w / roi_w, (float)net_h / roi_h);
    float pad_x = (net_w - roi_w * r) * 0.5f;
    float pad_y = (net_h - roi_h * r) * 0.5f;

    for (int s = 0; s < 3; ++s) {
      const int stride = strides[s];
      const auto& anchors = *AGRIDS[s];
      int N = (int)anchors.size();
      auto* out_sc = outputs->getBuffer(s*3 + 0);
      auto* out_bb = outputs->getBuffer(s*3 + 1);
      auto* out_kp = outputs->getBuffer(s*3 + 2);
      if (!out_sc || !out_bb || !out_kp) {
        std::cerr << "[scrfd] missing output buffer at scale index " << s << "\n";
        continue;
      }
      // Compute per-frame element counts from tensor dims to avoid overruns if shapes differ
      auto elems_from_dims = [](const dsis::IBatchBuffer* buf) -> int {
        auto d = buf->getBufDesc().dims;
        long long prod = 1;
        for (int i = 0; i < d.numDims; ++i) prod *= std::max(1, d.d[i]);
        if (prod <= 0 || prod > INT32_MAX) return 0;
        return (int)prod;
      };
      const int elems_sc = elems_from_dims(out_sc);
      const int elems_bb = elems_from_dims(out_bb);
      const int elems_kp = elems_from_dims(out_kp);
      int n_from_buf = std::min({ elems_sc, (elems_bb > 0 ? elems_bb/4 : 0), (elems_kp > 0 ? elems_kp/10 : 0) });
      if (n_from_buf <= 0) {
        std::cerr << "[scrfd] invalid tensor shapes at scale " << s
                  << " elems_sc=" << elems_sc << " elems_bb=" << elems_bb
                  << " elems_kp=" << elems_kp << "\n";
        continue;
      }
      if (N > n_from_buf) {
        std::cerr << "[scrfd] anchor count(" << N << ") > buffer N(" << n_from_buf
                  << ") at stride s" << stride << "; capping to prevent OOB\n";
        N = n_from_buf;
      }
      // Batchless-output tolerant indexing: prefer b when buffer reports a batch, else index 0
      auto select_index = [&](const dsis::IBatchBuffer* buf, const char* name, bool& ok) -> uint32_t {
        uint32_t bs = buf->getBatchSize();
        if (bs == 0) return 0; // treat as implicit batch-1
        if (b >= bs) {
          std::cerr << "[scrfd] batch index " << b << " out of range (" << bs << ") for " << name
                    << " at scale " << s << "\n";
          ok = false;
          return 0;
        }
        return b;
      };

      // One-time shape print for sanity (before any early-returns)
      static bool kPrintedShapes = false;
      if (!kPrintedShapes) {
        auto ds = out_sc->getBufDesc(); auto db = out_bb->getBufDesc(); auto dk = out_kp->getBufDesc();
        auto pd = [&](const char* tag, const dsis::IBatchBuffer* buf, const auto& d){
          std::cerr << "[scrfd] tensor " << tag << " dims=" << d.dims.numDims << " [";
          for (int i=0;i<d.dims.numDims;++i){ std::cerr << d.dims.d[i] << (i+1<d.dims.numDims?",":""); }
          std::cerr << "] batchReported=" << buf->getBatchSize() << "\n"; };
        pd("score", out_sc, ds); pd("bbox", out_bb, db); pd("kps", out_kp, dk);
        kPrintedShapes = true;
      }

      bool idx_ok = true;
      uint32_t idx_sc = select_index(out_sc, "score", idx_ok);
      uint32_t idx_bb = select_index(out_bb, "bbox", idx_ok);
      uint32_t idx_kp = select_index(out_kp, "kps", idx_ok);
      if (!idx_ok) continue;

      const float* scores_base = static_cast<const float*>(out_sc->getBufPtr(idx_sc));
      const float* bboxes_base = static_cast<const float*>(out_bb->getBufPtr(idx_bb));
      const float* kpses_base  = static_cast<const float*>(out_kp->getBufPtr(idx_kp));
      if (!scores_base || !bboxes_base || !kpses_base) {
        std::cerr << "[scrfd] null tensor ptr(s) at scale " << s << "\n";
        continue;
      }
      // Adjust pointers for implicit-batch layout (concatenated per-frame) when batchReported==0
      const uint32_t bs_sc = out_sc->getBatchSize();
      const uint32_t bs_bb = out_bb->getBatchSize();
      const uint32_t bs_kp = out_kp->getBatchSize();
  const int num_cells = n_from_buf; // anchors per frame at this stride from buffer
  const int step_sc = elems_sc;     // scores per frame elements
  const int step_bb = elems_bb;     // bbox floats per frame elements
  const int step_kp = elems_kp;     // kps floats per frame elements
      const float* scores = scores_base + ((bs_sc == 0) ? (int)b * step_sc : 0);
      const float* bboxes = bboxes_base + ((bs_bb == 0) ? (int)b * step_bb : 0);
      const float* kpses  = kpses_base  + ((bs_kp == 0) ? (int)b * step_kp : 0);

      // Debug: summarize score distribution for this stride once per frame
      float max_sc = 0.0f; int cnt_gt_01 = 0, cnt_gt_thr = 0;
      for (int i = 0; i < N; ++i) {
        float sc = scores[i];
        if (sc > max_sc) max_sc = sc;
        if (sc > 0.10f) ++cnt_gt_01;
        if (sc > conf_thr_s[s]) ++cnt_gt_thr;
      }
      std::cerr << "[scrfd] b=" << b << " s" << stride
                << " N=" << N
                << " max_sc=" << max_sc
                << " gt0.1=" << cnt_gt_01
                << " gt_thr=" << cnt_gt_thr << "\n";

      // Preselect top-K candidates above stride-specific threshold
      std::vector<std::pair<float,int>> cand;
      cand.reserve(std::min(N, topk_s[s]));
      const float thr = conf_thr_s[s];
      for (int i = 0; i < N; ++i) {
        float score = scores[i];
        if (score >= thr) cand.emplace_back(score, i);
      }
      // If none meet the threshold in fallback mode, still take the top-K by score to allow detections
      if (cand.empty() && (sgie_no_roi || sgie_fullframe)) {
        cand.reserve(std::min(N, topk_s[s]));
        for (int i = 0; i < N; ++i) cand.emplace_back(scores[i], i);
      }
      if ((int)cand.size() > topk_s[s]) {
        std::partial_sort(cand.begin(), cand.begin() + topk_s[s], cand.end(),
                          [](const auto& x, const auto& y){ return x.first > y.first; });
        cand.resize(topk_s[s]);
      } else {
        std::sort(cand.begin(), cand.end(), [](const auto& x, const auto& y){ return x.first > y.first; });
      }

      for (const auto& kv : cand) {
        int i = kv.second;
        float score = kv.first;
        const Anchor& a = anchors[i];

  const float* bd = bboxes + i*4;
  // Decode as distances from center (ltrb)
  float left   = a.cx - bd[0] * stride;
  float top    = a.cy - bd[1] * stride;
  float right  = a.cx + bd[2] * stride;
  float bottom = a.cy + bd[3] * stride;
  float w = std::max(0.0f, right - left);
  float h = std::max(0.0f, bottom - top);

        const float* kd = kpses + i*10;
        float lm[10];
        for (int k = 0; k < 5; ++k) {
          lm[k*2]   = a.cx + kd[k*2]   * stride;
          lm[k*2+1] = a.cy + kd[k*2+1] * stride;
        }

        Proposal p{}; p.score = score; p.rect.classId = 0;
        // Rescale to ROI coordinates and then offset into full-frame coordinates
        float rl = (left - pad_x) / r;
        float rt = (top  - pad_y) / r;
        float rw = w / r;
        float rh = h / r;
        // Map into full-frame
        float fl = roi_x + rl;
        float ft = roi_y + rt;
        p.rect.left   = std::max(0.0f, fl);
        p.rect.top    = std::max(0.0f, ft);
        p.rect.width  = std::min(frame_w - p.rect.left, rw);
        p.rect.height = std::min(frame_h - p.rect.top,  rh);
        for (int k = 0; k < 5; ++k) {
          float lx = roi_x + (lm[k*2]   - pad_x) / r;
          float ly = roi_y + (lm[k*2+1] - pad_y) / r;
          p.landmarks[k*2]   = std::min(frame_w, std::max(0.0f, lx));
          p.landmarks[k*2+1] = std::min(frame_h, std::max(0.0f, ly));
        }
        if (p.rect.width >= min_face && p.rect.height >= min_face) props.emplace_back(p);
      }
    }

  std::cerr << "[scrfd] b=" << b << " props_pre_nms=" << props.size() << "\n";
  auto finals = nms(props, nms_iou);
  if ((int)finals.size() > max_total_out) {
    std::partial_sort(finals.begin(), finals.begin() + max_total_out, finals.end(),
                      [](const Proposal& a, const Proposal& b){ return a.score > b.score; });
    finals.resize(max_total_out);
  }
  std::cerr << "[scrfd] b=" << b << " props_post_nms=" << finals.size() << "\n";
    if (attachObjMeta(inOptions, finals, b) != NVDSINFER_SUCCESS) {
      std::cerr << "[scrfd] attachObjMeta failed for batch index " << b << "\n";
      return NVDSINFER_CUSTOM_LIB_FAILED;
    }
  }
  return NVDSINFER_SUCCESS;
}

NvDsInferStatus NvInferServerCustomProcess::attachObjMeta(
  const dsis::IOptions* inOptions,
  const std::vector<Proposal>& props,
  uint32_t batchIdx)
{
  // Optional debug gates to help isolate crashes in the field.
  const char* no_attach_env = std::getenv("SCRFD_NO_ATTACH");
  const char* simple_attach_env = std::getenv("SCRFD_ATTACH_SIMPLE");
  const bool NO_ATTACH = (no_attach_env && (*no_attach_env != '\0'));
  const bool SIMPLE_ATTACH = (simple_attach_env && (*simple_attach_env != '\0'));

  if (NO_ATTACH) {
    std::cerr << "[scrfd] attachObjMeta: SCRFD_NO_ATTACH=1 set; skipping meta attach (props="
              << props.size() << ")\n";
    return NVDSINFER_SUCCESS;
  }
  NvDsBatchMeta* batchMeta = nullptr;
  if (!inOptions->hasValue(OPTION_NVDS_BATCH_META) ||
      inOptions->getObj(OPTION_NVDS_BATCH_META, batchMeta) != NVDSINFER_SUCCESS ||
      !batchMeta) {
    std::cerr << "[scrfd] attachObjMeta: missing batchMeta (hasKey="
              << (inOptions->hasValue(OPTION_NVDS_BATCH_META)?1:0) << ")\n";
    return NVDSINFER_CUSTOM_LIB_FAILED;
  }
  std::vector<NvDsFrameMeta*> frameMetaList;
  inOptions->getValueArray(OPTION_NVDS_FRAME_META_LIST, frameMetaList);
  std::vector<NvDsObjectMeta*> objMetaList;
  if (inOptions->hasValue(OPTION_NVDS_OBJ_META_LIST)) {
    inOptions->getValueArray(OPTION_NVDS_OBJ_META_LIST, objMetaList);
  }

  int64_t unique_id = 0; inOptions->getInt(OPTION_NVDS_UNIQUE_ID, unique_id);
  // Resolve target frame meta robustly for both PGIE (full-frame) and SGIE (clip-objects)
  NvDsFrameMeta* frameMeta = nullptr;
  // For PGIE and SGIE, frameMetaList is expected to be aligned with the batch indices
  if (batchIdx < frameMetaList.size()) frameMeta = frameMetaList[batchIdx];
  if (!frameMeta) {
    std::cerr << "[scrfd] attachObjMeta: missing frameMeta for batchIdx=" << batchIdx
              << " (frames=" << frameMetaList.size() << ", objs=" << objMetaList.size() << ")\n";
    return NVDSINFER_CUSTOM_LIB_FAILED;
  }

  // Decide if this invocation is secondary (SGIE) vs primary (PGIE).
  // Heuristics:
  // - DS sets OPTION_NVDS_OBJ_META_LIST for SGIE clip-objects calls (key present; may be empty)
  // - PGIE in this pipeline uses unique_id == 1
  // Only treat "zero ROIs" as an SGIE condition if OPTION_NVDS_OBJ_META_LIST is actually present.
  const bool hasObjListKey = inOptions->hasValue(OPTION_NVDS_OBJ_META_LIST);
  int64_t stage_uid = 0; inOptions->getInt(OPTION_NVDS_UNIQUE_ID, stage_uid);
  const bool likelyPGIE = (stage_uid == 1);
  const bool isSecondary = (!likelyPGIE);
  const bool sgie_mode = (hasObjListKey && !objMetaList.empty());
  const bool sgie_fullframe = (isSecondary && !hasObjListKey);
  if (hasObjListKey && objMetaList.empty()) {
    // Many DS7 builds reset SGIE to full-frame if clip-objects cannot be honored; in that case,
    // DeepStream still sets the OBJ_META_LIST key but leaves it empty. Proceed as full-frame to
    // avoid silently dropping detections.
    std::cerr << "[scrfd] attachObjMeta: SGIE zero ROIs (uid=" << stage_uid
              << ") ; FALLBACK to full-frame attach (default)\n";
  }
  // If secondary full-frame (key absent), skip ROI filtering temporarily to isolate crash
  std::vector<Proposal> filtered_props;
  std::cerr << "[scrfd] attachObjMeta: SGIE sgie_mode=" << (sgie_mode?1:0)
            << " fullframe=" << (sgie_fullframe?1:0)
            << " props=" << props.size() << "\n";
  
  // For SGIE clip-objects: we have parent objects in objMetaList; attach faces as children
  // For SGIE full-frame: treat similar to clip-objects by gathering parent objects from batch meta
  // For PGIE: attach faces directly to frame
  
  // Collect all parent person objects for full-frame SGIE matching
  std::vector<NvDsObjectMeta*> parent_objs_fullframe;
  if (sgie_fullframe && batchMeta && batchMeta->frame_meta_list) {
    std::cerr << "[scrfd] attachObjMeta: SGIE full-frame; gathering parent objects from batch meta\n";
    for (NvDsMetaList* fl = batchMeta->frame_meta_list; fl != nullptr; fl = fl->next) {
      if (!fl->data) continue;
      NvDsFrameMeta* fm = (NvDsFrameMeta*)fl->data;
      if (!fm || !fm->obj_meta_list) continue;
      for (NvDsMetaList* ol = fm->obj_meta_list; ol != nullptr; ol = ol->next) {
        if (!ol->data) continue;
        NvDsObjectMeta* om = (NvDsObjectMeta*)ol->data;
        if (om->class_id == 0) parent_objs_fullframe.push_back(om); // person class
      }
    }
    std::cerr << "[scrfd] attachObjMeta: found " << parent_objs_fullframe.size() << " parent person objects\n";
  }
  
  const std::vector<Proposal>& use_props = props;
  if (frameMetaList.empty() || (!frameMetaList[0])) {
    std::cerr << "[scrfd] attachObjMeta: no valid frameMeta available; skipping attach (sgie="
              << (sgie_mode?1:0) << ")\n";
    return NVDSINFER_SUCCESS;
  }
  int attach_count = 0;
  const bool USE_LOCK = (std::getenv("SCRFD_META_LOCK") != nullptr);
  for (size_t idx = 0; idx < use_props.size(); ++idx) {
    const auto& p = use_props[idx];
    
    // For SGIE clip-objects: determine parent from objMetaList
    // For SGIE full-frame: match face to parent person by containment
    NvDsObjectMeta* parent_obj = nullptr;
    if (sgie_mode && idx < objMetaList.size()) {
      parent_obj = objMetaList[idx];
    } else if (sgie_fullframe && !parent_objs_fullframe.empty()) {
      // Match this face to a parent person (center-point containment)
      float cx = p.rect.left + 0.5f * p.rect.width;
      float cy = p.rect.top + 0.5f * p.rect.height;
      for (auto* po : parent_objs_fullframe) {
        const auto& pr = po->rect_params;
        if (cx >= pr.left && cx <= pr.left + pr.width &&
            cy >= pr.top && cy <= pr.top + pr.height) {
          parent_obj = po;
          break;
        }
      }
    }
    
    NvDsObjectMeta* om = nvds_acquire_obj_meta_from_pool(batchMeta);
    if (!om) {
      std::cerr << "[scrfd] attachObjMeta: nvds_acquire_obj_meta_from_pool returned null\n";
      break;
    }
    om->unique_component_id = unique_id;
    om->confidence = p.score;
    om->object_id = UNTRACKED_OBJECT_ID;
    om->class_id = p.rect.classId; // 0 => face

    NvOSD_RectParams& r = om->rect_params;
    r.left = p.rect.left; r.top = p.rect.top;
    r.width = p.rect.width; r.height = p.rect.height;
    r.border_width = 2; r.has_bg_color = 0;
    r.border_color = (NvOSD_ColorParams){0, 1, 0, 1};

    // Minimal text/label to avoid any allocation/free issues inside OSD
    NvOSD_TextParams& t = om->text_params;
    om->obj_label[0] = '\0';
    t.display_text = nullptr;
    t.font_params.font_name = nullptr;
    t.font_params.font_size = 0;
    t.set_bg_clr = 0;
    t.x_offset = 0; t.y_offset = 0;

  // Ensure no mask meta is attached (no-op for this DS version)

    // Important meta
    om->detector_bbox_info.org_bbox_coords.left   = r.left;
    om->detector_bbox_info.org_bbox_coords.top    = r.top;
    om->detector_bbox_info.org_bbox_coords.width  = r.width;
    om->detector_bbox_info.org_bbox_coords.height = r.height;

    // Validate rect to avoid downstream crashes
    if (!(std::isfinite(r.left) && std::isfinite(r.top) && std::isfinite(r.width) && std::isfinite(r.height)) ||
        r.width <= 0.0f || r.height <= 0.0f ||
        r.left >= frameMeta->source_frame_width || r.top >= frameMeta->source_frame_height) {
      // Skip invalid rect; best-effort: do not attach; DS will reclaim meta when frame ends
      continue;
    }

    // Choose frame meta: for SGIE (clip-objects or full-frame), use frameMetaList[0]; for PGIE use batch-aligned
    NvDsFrameMeta* tgtFrame = frameMeta; // fallback
    if (isSecondary && !frameMetaList.empty() && frameMetaList[0]) {
      // SGIE (both sgie_mode and sgie_fullframe): always use first frame
      tgtFrame = frameMetaList[0];
    } else if (!isSecondary && batchIdx < frameMetaList.size() && frameMetaList[batchIdx]) {
      // PGIE: use batch-aligned frame
      tgtFrame = frameMetaList[batchIdx];
    }
    if (!tgtFrame) {
      std::cerr << "[scrfd] attach: tgtFrame is null; skipping object\n";
      // DS will reclaim meta when batch ends; no explicit release needed
      continue;
    }
    // Final clamp to frame bounds (integer-safe) before attach
    r.left   = std::max(0.0f, std::min(r.left,   (float)tgtFrame->source_frame_width  - 1.0f));
    r.top    = std::max(0.0f, std::min(r.top,    (float)tgtFrame->source_frame_height - 1.0f));
    r.width  = std::max(1.0f, std::min(r.width,  (float)tgtFrame->source_frame_width  - r.left));
    r.height = std::max(1.0f, std::min(r.height, (float)tgtFrame->source_frame_height - r.top));

  std::cerr << "[scrfd] attach: uid=" << stage_uid << " sgie=" << (sgie_mode?1:0)
              << " isSec=" << (isSecondary?1:0)
              << " frame=" << (void*)tgtFrame
              << " parent=" << (void*)parent_obj
              << " rect=[" << r.left << "," << r.top << "," << r.width << "," << r.height << "]"
              << " conf=" << om->confidence << "\n";
    
    // Attach: SGIE clip-objects passes parent_obj, PGIE passes nullptr
    // Note: SGIE full-frame with parent_obj may crash in DS 7.0
    if (NO_ATTACH) {
      std::cerr << "[scrfd] attach: SCRFD_NO_ATTACH set; skipping nvds_add_obj_meta_to_frame\n";
    } else {
      if (USE_LOCK) nvds_acquire_meta_lock(batchMeta);
      nvds_add_obj_meta_to_frame(tgtFrame, om, parent_obj);
      if (USE_LOCK) nvds_release_meta_lock(batchMeta);
    }
    ++attach_count;
    if (SIMPLE_ATTACH && attach_count >= 8) {
      std::cerr << "[scrfd] attachObjMeta: SIMPLE mode stop after " << attach_count << " attaches\n";
      break;
    }
    if (stage_uid == 1) {
      tgtFrame->bInferDone = TRUE; // Only mark PGIE (uid=1) as detector frame
    }
  }
  std::cerr << "[scrfd] attachObjMeta: attached objects = " << attach_count << " / " << use_props.size() << "\n";
  return NVDSINFER_SUCCESS;
}

extern "C" dsis::IInferCustomProcessor* CreateInferServerCustomProcess(
  const char* /*config*/, uint32_t /*configLen*/) {
  std::cerr << "[scrfd] CreateInferServerCustomProcess() called\n";
  return new NvInferServerCustomProcess();
}
  1. regarding “Why does Deepstream appear to ingore my config?”, To help reproduce this issue, could you provide a simplify project? For simplicity, you can use modify DeepStream native sample.
  2. if sgie works in full-frame mode, parent_obj should be set null when calling nvds_add_obj_meta_to_frame() because the objects have no parent object.

Hi,

I’ve reproduced the nvds_add_obj_meta_to_frame crash with a minimal 60-line custom processor.

The same API call works perfectly when this processor is run as a PGIE (uid=1), but causes an immediate segmentation fault when run as an SGIE (uid=3).

The SGIE also seems to ignore PROCESS_MODE_CLIP_OBJECTS and runs in full-frame mode, but the main issue is the crash.


1. Minimal Custom Processor (ultra_minimal.cpp)

This processor does no inference. It just tries to attach one dummy object.

C++

/*
 * ULTRA MINIMAL - Just to test if SGIE context allows nvds_add_obj_meta_to_frame
 * Does NO inference, NO processing - just creates one dummy box and tries to attach
 */
#include "nvdsinferserver/infer_custom_process.h"
#include "nvdsmeta.h"
#include <iostream>

namespace dsis = nvdsinferserver;

class UltraMinimal : public dsis::IInferCustomProcessor {
public:
  void supportInputMemType(dsis::InferMemType& t) override { t = dsis::InferMemType::kCpu; }
  bool requireInferLoop() const override { return false; }
  NvDsInferStatus extraInputProcess(const std::vector<dsis::IBatchBuffer*>&,
                                    std::vector<dsis::IBatchBuffer*>&,
                                    const dsis::IOptions*) override { return NVDSINFER_SUCCESS; }
  void notifyError(NvDsInferStatus) override {}
  
  NvDsInferStatus inferenceDone(const dsis::IBatchArray*, const dsis::IOptions* opts) override {
    std::cerr << "[ULTRA] inferenceDone called\n";
    
    NvDsBatchMeta* batch = nullptr;
    if (opts->getObj(OPTION_NVDS_BATCH_META, batch) != NVDSINFER_SUCCESS || !batch) return NVDSINFER_SUCCESS;
    
    std::vector<NvDsFrameMeta*> frames;
    opts->getValueArray(OPTION_NVDS_FRAME_META_LIST, frames);
    if (frames.empty()) return NVDSINFER_SUCCESS;
    
    int64_t uid = 0; opts->getInt(OPTION_NVDS_UNIQUE_ID, uid);
    bool hasObj = opts->hasValue(OPTION_NVDS_OBJ_META_LIST);
    std::cerr << "[ULTRA] uid=" << uid << " hasObjKey=" << hasObj << "\n";
    
    // Create ONE dummy object
    NvDsObjectMeta* om = nvds_acquire_obj_meta_from_pool(batch);
    om->unique_component_id = uid;
    om->confidence = 0.99f;
    om->class_id = 0;
    om->object_id = UNTRACKED_OBJECT_ID;
    om->rect_params.left = 200;
    om->rect_params.top = 200;
    om->rect_params.width = 100;
    om->rect_params.height = 100;
    
    std::cerr << "[ULTRA] Calling nvds_add_obj_meta_to_frame...\n";
    nvds_add_obj_meta_to_frame(frames[0], om, nullptr); // <-- CRASHES HERE
    std::cerr << "[ULTRA] SUCCESS!\n";
    
    return NVDSINFER_SUCCESS;
  }
};

extern "C" dsis::IInferCustomProcessor* CreateInferServerCustomProcess(const char*, uint32_t) {
  std::cerr << "[ULTRA] CreateInferServerCustomProcess\n";
  return new UltraMinimal();
}


2. Build Command

Bash

g++ -std=c++17 -g -O2 -shared -fPIC ultra_minimal.cpp \
  -o libnvdsinferserver_ultra_minimal.so \
  -I/usr/local/cuda/include \
  -I/opt/nvidia/deepstream/deepstream/sources/includes \
  -I/opt/nvidia/deepstream/deepstream/sources/includes/nvdsinferserver \
  $(pkg-config --cflags --libs glib-2.0 gstreamer-1.0)


3. SGIE Config (sgie_ultra_config.txt)

This is run as an SGIE after a PGIE (YOLO) and nvtracker.

Protocol Buffers

infer_config {
  unique_id: 3
  gpu_ids: [0]
  max_batch_size: 1
  backend {
    triton {
      model_name: "scrfd" # A valid Triton model, but its output isn't used
      version: -1
      grpc { url: "localhost:8001" }
    }
  }
  postprocess { other {} } # Enables custom processing
  extra {
    custom_process_funcion: "CreateInferServerCustomProcess"
  }
  custom_lib {
    # Path to the .so built above
    path: "/data/triton_models/standalone-repro-package/custom_processor/libnvdsinferserver_ultra_minimal.so"
  }
}

# This entire block seems to be ignored by nvinferserver
input_control {
  process_mode: PROCESS_MODE_CLIP_OBJECTS
  interval: 0
  operate_on_gie_id: 1
  operate_on_class_ids: [0]
}

output_control {
  output_tensor_meta: false
}


4. Crash Log

This is the exact output when the pipeline runs.

[ULTRA] CreateInferServerCustomProcess
[ULTRA] inferenceDone called
[ULTRA] uid=3 hasObjKey=0                <-- Confirms it's SGIE (uid=3) and running in full-frame (hasObjKey=0)
[ULTRA] Calling nvds_add_obj_meta_to_frame...
Segmentation fault (core dumped)         <-- CRASH! "[ULTRA] SUCCESS!" never prints


5. Question

Why does nvds_add_obj_meta_to_frame(frames[0], om, nullptr) cause a segmentation fault when called from an nvinferserver SGIE (uid=3), but work perfectly when called from a PGIE (uid=1)?

Is this a known limitation, or is there a different API for attaching metadata from an SGIE custom processor?

Thank you!

On your side, there are two issue. one is that setting PROCESS_MODE_CLIP_OBJECTS does not work, the other is that adding meta crashed when sgie works in full frame mode. I provided a complete sample based on servicemaker test2. From my test, Neither of these problems can be reproduced. Please refer to the test details.
deepstream_test2_app.zip (150.1 KB)
test1: set dstest2_sgie1_nvinferserver_config_bk.txt in dstest2_config.yaml, after running, the app printed “uuid:2, processObjects”, which is the log added in nvinferserver GstNvInferServerImpl::processObjects.
log-sgie-objects.txt (15.6 KB)
test2: set dstest2_sgie1_nvinferserver_config.txt in dstest2_config.yaml, after running, the app printed “Calling nvds_add_obj_meta_to_frame…” and did not crash.
log-sgie-full.txt (47.7 KB)

Thank you for your help,

Your Test Results Explained

Test 1 (CLIP_OBJECTS worked): Used built-in classifier

Protocol Buffers

postprocess {
  classification {  # ← No custom processor
    threshold: 0.51
  }
}

  • CLIP_OBJECTS works for classifiers

  • No custom_lib section needed

Test 2 (FULL_FRAME worked): Used custom processor

Protocol Buffers

postprocess {
  other {}  # ← Custom processor
}
custom_lib {
  path: "nvdsinferserver_custom_impl_yolo/libnvdstriton_custom_impl_yolo.so"
}

  • FULL_FRAME works with custom processor

  • Your custom processor followed correct pattern

My Model Requirements

SCRFD outputs: 9 tensors (scores, bboxes, keypoints for 3 strides)

  • So I can’t use built-in classifier and it requires custom decoding logic

My custom processor was crashing in FULL_FRAME mode due to:

  1. Complex parent object matching logic

C++

// OLD CODE - BROKEN
NvDsObjectMeta* parent_obj = nullptr;
if (sgie_mode && idx < objMetaList.size()) {
  parent_obj = objMetaList[idx];  // ← Trying to use parent in SGIE
} else if (sgie_fullframe && !parent_objs_fullframe.empty()) {
  // Match this face to a parent person by containment
  // ... complex matching logic ...
}
nvds_add_obj_meta_to_frame(tgtFrame, om, parent_obj);  // ← CRASH!

  1. Inconsistent locking and flow control

  2. Trying to handle both CLIP_OBJECTS and FULL_FRAME in one path

Fix

I followed your working YOLO custom processor pattern exactly:

C++

// NEW CODE - WORKS
for (const auto& face : faces) {
    NvDsObjectMeta* objMeta = nvds_acquire_obj_meta_from_pool(batchMeta);
    objMeta->unique_component_id = unique_id;
    objMeta->confidence = face.confidence;
    objMeta->object_id = UNTRACKED_OBJECT_ID;
    objMeta->class_id = 0;
    
    // Set rectangle
    NvOSD_RectParams& rect_params = objMeta->rect_params;
    rect_params.left = face.left;
    rect_params.top = face.top;
    rect_params.width = face.width;
    rect_params.height = face.height;
    rect_params.border_width = 2;
    rect_params.border_color = (NvOSD_ColorParams){0, 1, 0, 1};
    
    // Set text params ...
    
    // Set detector bbox info
    objMeta->detector_bbox_info.org_bbox_coords.left = face.left;
    objMeta->detector_bbox_info.org_bbox_coords.top = face.top;
    objMeta->detector_bbox_info.org_bbox_coords.width = face.width;
    objMeta->detector_bbox_info.org_bbox_coords.height = face.height;
    
    // Attach - ALWAYS use NULL parent like YOLO does
    nvds_acquire_meta_lock(batchMeta);
    nvds_add_obj_meta_to_frame(frameMetaList[batchIdx], objMeta, NULL);
    frameMetaList[batchIdx]->bInferDone = TRUE;
    nvds_release_meta_lock(batchMeta);
}

Key changes:

  1. Always pass NULL as parent (not trying to match parent objects)

  2. Consistent lock/unlock pattern matching YOLO

  3. Simplified flow - removed complex conditional logic

  4. Set bInferDone = TRUE for each frame

Result

  • SCRFD now works in FULL_FRAME mode as SGIE

  • Detects faces on full frames

  • No crashes


Problem: CLIP_OBJECTS Still Won’t Work

Why CLIP_OBJECTS Doesn’t Work

When I configure:

Protocol Buffers

input_control {
  process_mode: PROCESS_MODE_CLIP_OBJECTS
  operate_on_gie_id: 1
  operate_on_class_ids: [0]
}

My custom processor receives:

  • hasObjListKey = 0 (OPTION_NVDS_OBJ_META_LIST key is absent)

  • It runs in full-frame mode anyway

  • (This is the mode that was crashing before I applied the fix)

This is different from Test 1 because:

  • Test 1 used built-in classifier (no custom processor)

  • CLIP_OBJECTS works for classifiers

  • But doesn’t seem to work for detector custom processors using postprocess { other {} }

My Question

Is there a way to make CLIP_OBJECTS work with detector custom processors?

Or is this the expected behavior:

  • CLIP_OBJECTS → Built-in classifiers only

  • FULL_FRAME → Custom processors (detectors)


Questions

  1. Is CLIP_OBJECTS designed only for classifiers?

    • Should detectors with custom processors always use FULL_FRAME?
  2. If I need to process person crops in SGIE, what’s the recommended approach?

    • Manual ROI extraction in FULL_FRAME mode (like the workaround above)?

    • Different plugin/approach?

  3. Is the parent pointer in nvds_add_obj_meta_to_frame() intended for SGIE use?

    • Your YOLO example uses NULL.

    • Should I always use NULL even if I manually find the parent object, to avoid the crash?


Current Working Configuration

SCRFD SGIE Config (FULL_FRAME):

infer_config {
  unique_id: 3
  gpu_ids: [0]
  max_batch_size: 1
  backend {
    triton {
      model_name: "scrfd"
      version: -1
      grpc { url: "localhost:8001" }
    }
  }
  preprocess {
    network_format: IMAGE_FORMAT_BGR
    tensor_order: TENSOR_ORDER_LINEAR
    maintain_aspect_ratio: 1
    symmetric_padding: 1
    normalize { scale_factor: 1.0 }
  }
  postprocess { other {} }
  extra {
    custom_process_funcion: "CreateInferServerCustomProcess"
  }
  custom_lib {
    path: "/path/to/libnvdsinferserver_custom_process_scrfd.so"
  }
}
input_control {
  process_mode: PROCESS_MODE_FULL_FRAME  # CLIP_OBJECTS doesn't work with my custom processor
  interval: 0
}
output_control {
  output_tensor_meta: false
}

Pipeline: sourcePGIE (YOLO)trackerSGIE (SCRFD)tilerOSDsink


Thank you for your help! I hope you’re having a good day, sorry if my intention wasn’t clear at times.

could you modified deepstream_test2_app.zip I shared to reproduce this issue? If so, could you share the code modifications? Thanks! Or Since nvinferserver is opensource, You can add log to check why mode is not correct.

The thing is, my system is a Jetson Orin (aarch64), and I cannot execute the deepstream-test2-app binary you sent as it is compiled for x86-64.

Bash

$ file deepstream_test2_app/build/deepstream-test2-app
deepstream_test2_app/build/deepstream-test2-app: ELF 64-bit LSB pie executable, x86-64

$ uname -m
aarch64

Could you please provide a build for ARM64 (aarch64)?

I tried to recompile from the source provided, but I am missing some service-maker dependencies.

As an alternative, since I cannot run your testbed, perhaps you could try running my custom processor in your deepstream-test2-app environment? This is the version I’ve confirmed works on my end in FULL_FRAME mode, as it follows the YOLO pattern you suggested (using NULL for the parent object).

If you run this, you should be able to reproduce the final remaining question:

  1. It will work correctly in FULL_FRAME mode.

  2. When PROCESS_MODE_CLIP_OBJECTS is set in the config, it will still run in FULL_FRAME mode (the log will show hasObjKey=0).

This is the core issue I am trying to confirm: whether CLIP_OBJECTS is supported for custom detector processors.

This is my scrfd custom process

/*
 * SCRFD Custom Processor for DeepStream nvinferserver
 * Based on proven YOLO parser pattern
 */
#include <string.h>
#include <algorithm>
#include <cmath>
#include <iostream>
#include <vector>

#include "infer_custom_process.h"
#include "nvbufsurface.h"
#include "nvdsmeta.h"

namespace dsis = nvdsinferserver;

static const std::vector<std::string> kFaceLabels = { "face" };

struct Anchor { float cx, cy; };

struct FaceDetection {
    float left, top, width, height;
    float confidence;
    float landmarks[10];  // 5 keypoints (x,y) - optional
};

static float calculate_iou(const FaceDetection& a, const FaceDetection& b) {
    float x1 = std::max(a.left, b.left);
    float y1 = std::max(a.top, b.top);
    float x2 = std::min(a.left + a.width, b.left + b.width);
    float y2 = std::min(a.top + a.height, b.top + b.height);
    float iw = std::max(0.0f, x2 - x1);
    float ih = std::max(0.0f, y2 - y1);
    float inter = iw * ih;
    float ua = a.width * a.height + b.width * b.height - inter;
    return ua > 0.0f ? inter / ua : 0.0f;
}

static std::vector<FaceDetection> apply_nms(std::vector<FaceDetection>& faces, float iou_thr) {
    std::vector<FaceDetection> out;
    if (faces.empty()) return out;
    std::sort(faces.begin(), faces.end(), [](auto& a, auto& b) { return a.confidence > b.confidence; });
    std::vector<char> suppressed(faces.size(), 0);
    for (size_t i = 0; i < faces.size(); ++i) {
        if (suppressed[i]) continue;
        out.push_back(faces[i]);
        for (size_t j = i + 1; j < faces.size(); ++j) {
            if (!suppressed[j] && calculate_iou(faces[i], faces[j]) > iou_thr) {
                suppressed[j] = 1;
            }
        }
    }
    return out;
}

class NvInferServerCustomProcess : public dsis::IInferCustomProcessor {
public:
    ~NvInferServerCustomProcess() override = default;

    void supportInputMemType(dsis::InferMemType& type) override { 
        type = dsis::InferMemType::kCpu; 
    }

    bool requireInferLoop() const override { return false; }

    NvDsInferStatus extraInputProcess(
        const std::vector<dsis::IBatchBuffer*>&,
        std::vector<dsis::IBatchBuffer*>&,
        const dsis::IOptions*) override {
        return NVDSINFER_SUCCESS;
    }

    NvDsInferStatus inferenceDone(
        const dsis::IBatchArray* outputs, const dsis::IOptions* opts) override;

    void notifyError(NvDsInferStatus) override {}

private:
    NvDsInferStatus attachObjMeta(
        const dsis::IOptions* opts,
        const std::vector<FaceDetection>& faces,
        uint32_t batchIdx,
        NvDsObjectMeta* parentObj = nullptr);
};

NvDsInferStatus NvInferServerCustomProcess::inferenceDone(
    const dsis::IBatchArray* outputs, const dsis::IOptions* opts)
{
    if (!outputs || outputs->getSize() != 9) {
        std::cerr << "[scrfd] Expected 9 outputs, got " << (outputs ? outputs->getSize() : 0) << "\n";
        return NVDSINFER_CUSTOM_LIB_FAILED;
    }

    // Check if we're in CLIP_OBJECTS mode (SGIE with parent objects)
    bool hasObjListKey = opts->hasValue(OPTION_NVDS_OBJ_META_LIST);
    std::vector<NvDsObjectMeta*> parentObjList;
    if (hasObjListKey) {
        opts->getValueArray(OPTION_NVDS_OBJ_META_LIST, parentObjList);
    }
    
    bool isClipObjectsMode = hasObjListKey && !parentObjList.empty();
    std::cerr << "[scrfd] Mode: " << (isClipObjectsMode ? "CLIP_OBJECTS" : "FULL_FRAME") 
              << " (hasObjKey=" << hasObjListKey << ", parents=" << parentObjList.size() << ")\n";

    // Get surface params for frame dimensions
    std::vector<NvBufSurfaceParams*> surfParamsList;
    if (opts->hasValue(OPTION_NVDS_BUF_SURFACE_PARAMS_LIST)) {
        opts->getValueArray(OPTION_NVDS_BUF_SURFACE_PARAMS_LIST, surfParamsList);
    }
    if (surfParamsList.empty()) {
        std::cerr << "[scrfd] No surface params\n";
        return NVDSINFER_CUSTOM_LIB_FAILED;
    }

    uint32_t batchSize = isClipObjectsMode ? parentObjList.size() : surfParamsList.size();
    std::cerr << "[scrfd] Processing batch size: " << batchSize << "\n";
    
    // Network parameters
    const int net_w = 640, net_h = 640;
    const int strides[3] = {8, 16, 32};
    const float conf_thr = 0.5f;
    const float nms_iou = 0.4f;

    // Generate anchors for each stride
    std::vector<std::vector<Anchor>> all_anchors(3);
    for (int s = 0; s < 3; ++s) {
        int fw = net_w / strides[s];
        int fh = net_h / strides[s];
        for (int y = 0; y < fh; ++y) {
            for (int x = 0; x < fw; ++x) {
                float cx = (x + 0.5f) * strides[s];
                float cy = (y + 0.5f) * strides[s];
                all_anchors[s].push_back({cx, cy});
            }
        }
    }

    // Process each batch element
    for (uint32_t b = 0; b < batchSize; ++b) {
        // In CLIP_OBJECTS mode, get parent person's ROI for coordinate transformation
        float roi_x = 0.0f, roi_y = 0.0f, roi_w = 0.0f, roi_h = 0.0f;
        NvDsObjectMeta* parentObj = nullptr;
        
        if (isClipObjectsMode) {
            parentObj = parentObjList[b];
            roi_x = parentObj->rect_params.left;
            roi_y = parentObj->rect_params.top;
            roi_w = parentObj->rect_params.width;
            roi_h = parentObj->rect_params.height;
            std::cerr << "[scrfd] Person " << b << " ROI: [" << roi_x << "," << roi_y << "," << roi_w << "," << roi_h << "]\n";
        } else {
            // FULL_FRAME mode: use full frame dimensions
            uint32_t surf_idx = std::min(b, (uint32_t)surfParamsList.size() - 1);
            roi_w = surfParamsList[surf_idx]->width;
            roi_h = surfParamsList[surf_idx]->height;
        }

        // Compute preprocessing params (maintain_aspect_ratio + symmetric_padding)
        float r = std::min((float)net_w / roi_w, (float)net_h / roi_h);
        float pad_x = (net_w - roi_w * r) * 0.5f;
        float pad_y = (net_h - roi_h * r) * 0.5f;

        std::vector<FaceDetection> raw_faces;

        // Decode SCRFD outputs: for each stride, get score + bbox + kps
        for (int s = 0; s < 3; ++s) {
            const int stride = strides[s];
            const auto& anchors = all_anchors[s];

            auto* out_sc = outputs->getBuffer(s * 3 + 0);  // score
            auto* out_bb = outputs->getBuffer(s * 3 + 1);  // bbox
            auto* out_kp = outputs->getBuffer(s * 3 + 2);  // keypoints

            if (!out_sc || !out_bb) continue;

            uint32_t batch_idx = (out_sc->getBatchSize() > 0) ? std::min(b, out_sc->getBatchSize() - 1) : 0;
            const float* scores = static_cast<const float*>(out_sc->getBufPtr(batch_idx));
            const float* bboxes = static_cast<const float*>(out_bb->getBufPtr(batch_idx));
            const float* keypts = out_kp ? static_cast<const float*>(out_kp->getBufPtr(batch_idx)) : nullptr;

            if (!scores || !bboxes) continue;

            int N = anchors.size();
            for (int i = 0; i < N; ++i) {
                float score = scores[i];
                if (score < conf_thr) continue;

                const Anchor& a = anchors[i];
                const float* bd = bboxes + i * 4;

                // Decode as distances from anchor center (ltrb)
                float left = a.cx - bd[0] * stride;
                float top = a.cy - bd[1] * stride;
                float right = a.cx + bd[2] * stride;
                float bottom = a.cy + bd[3] * stride;
                float w = right - left;
                float h = bottom - top;

                // Map from network coordinates to ROI coordinates
                float roi_left = (left - pad_x) / r;
                float roi_top = (top - pad_y) / r;
                float roi_width = w / r;
                float roi_height = h / r;

                // Map from ROI to full frame coordinates
                float frame_left = roi_x + roi_left;
                float frame_top = roi_y + roi_top;
                float frame_width = roi_width;
                float frame_height = roi_height;

                // Get frame dimensions for clamping
                uint32_t surf_idx = isClipObjectsMode ? 0 : std::min(b, (uint32_t)surfParamsList.size() - 1);
                float frame_w = surfParamsList[surf_idx]->width;
                float frame_h = surfParamsList[surf_idx]->height;

                // Clamp to frame bounds
                frame_left = std::max(0.0f, std::min(frame_left, frame_w - 1.0f));
                frame_top = std::max(0.0f, std::min(frame_top, frame_h - 1.0f));
                frame_width = std::max(1.0f, std::min(frame_width, frame_w - frame_left));
                frame_height = std::max(1.0f, std::min(frame_height, frame_h - frame_top));

                if (frame_width > 10.0f && frame_height > 10.0f) {  // Min face size
                    FaceDetection face;
                    face.left = frame_left;
                    face.top = frame_top;
                    face.width = frame_width;
                    face.height = frame_height;
                    face.confidence = score;

                    // Optionally decode keypoints
                    if (keypts) {
                        const float* kp = keypts + i * 10;
                        for (int k = 0; k < 5; ++k) {
                            float kp_x = (kp[k * 2] * stride + a.cx - pad_x) / r + roi_x;
                            float kp_y = (kp[k * 2 + 1] * stride + a.cy - pad_y) / r + roi_y;
                            face.landmarks[k * 2] = kp_x;
                            face.landmarks[k * 2 + 1] = kp_y;
                        }
                    }

                    raw_faces.push_back(face);
                }
            }
        }

        // Apply NMS
        auto final_faces = apply_nms(raw_faces, nms_iou);

        std::cerr << "[scrfd] Batch " << b << ": " << raw_faces.size() << " raw -> " 
                  << final_faces.size() << " after NMS\n";

        // Attach to batch meta (pass parent object if in CLIP_OBJECTS mode)
        if (attachObjMeta(opts, final_faces, b, parentObj) != NVDSINFER_SUCCESS) {
            return NVDSINFER_CUSTOM_LIB_FAILED;
        }
    }

    return NVDSINFER_SUCCESS;
}

NvDsInferStatus NvInferServerCustomProcess::attachObjMeta(
    const dsis::IOptions* opts,
    const std::vector<FaceDetection>& faces,
    uint32_t batchIdx,
    NvDsObjectMeta* parentObj)
{
    NvDsBatchMeta* batchMeta = nullptr;
    std::vector<NvDsFrameMeta*> frameMetaList;
    int64_t unique_id = 0;

    // Get batch meta
    if (opts->hasValue(OPTION_NVDS_BATCH_META)) {
        if (opts->getObj(OPTION_NVDS_BATCH_META, batchMeta) != NVDSINFER_SUCCESS || !batchMeta) {
            return NVDSINFER_CUSTOM_LIB_FAILED;
        }
    }

    // Get frame meta list
    if (opts->hasValue(OPTION_NVDS_FRAME_META_LIST)) {
        if (opts->getValueArray(OPTION_NVDS_FRAME_META_LIST, frameMetaList) != NVDSINFER_SUCCESS) {
            return NVDSINFER_CUSTOM_LIB_FAILED;
        }
    }
    
    // In CLIP_OBJECTS mode, use frame[0], in FULL_FRAME use batchIdx
    uint32_t frameIdx = parentObj ? 0 : batchIdx;
    if (frameIdx >= frameMetaList.size()) {
        return NVDSINFER_CUSTOM_LIB_FAILED;
    }

    // Get unique ID
    if (opts->hasValue(OPTION_NVDS_UNIQUE_ID)) {
        opts->getInt(OPTION_NVDS_UNIQUE_ID, unique_id);
    }

    std::cerr << "[scrfd] Attaching " << faces.size() << " faces with parent=" 
              << (parentObj ? "yes" : "no") << "\n";

    // Attach each face as object meta
    for (const auto& face : faces) {
        NvDsObjectMeta* objMeta = nvds_acquire_obj_meta_from_pool(batchMeta);
        objMeta->unique_component_id = unique_id;
        objMeta->confidence = face.confidence;
        objMeta->object_id = UNTRACKED_OBJECT_ID;
        objMeta->class_id = 0;  // face class

        // Set rectangle
        NvOSD_RectParams& rect_params = objMeta->rect_params;
        rect_params.left = face.left;
        rect_params.top = face.top;
        rect_params.width = face.width;
        rect_params.height = face.height;
        rect_params.border_width = 2;
        rect_params.has_bg_color = 0;
        rect_params.border_color = (NvOSD_ColorParams){0, 1, 0, 1};  // Green

        // Set text
        NvOSD_TextParams& text_params = objMeta->text_params;
        text_params.display_text = g_strdup("face");
        strncpy(objMeta->obj_label, "face", MAX_LABEL_SIZE - 1);
        objMeta->obj_label[MAX_LABEL_SIZE - 1] = 0;
        
        text_params.x_offset = rect_params.left;
        text_params.y_offset = std::max(0.0f, rect_params.top - 10.0f);
        text_params.set_bg_clr = 1;
        text_params.text_bg_clr = (NvOSD_ColorParams){0, 0, 0, 1};
        text_params.font_params.font_name = (gchar*)"Serif";
        text_params.font_params.font_size = 10;
        text_params.font_params.font_color = (NvOSD_ColorParams){0, 1, 0, 1};

        // Set detector bbox info
        objMeta->detector_bbox_info.org_bbox_coords.left = face.left;
        objMeta->detector_bbox_info.org_bbox_coords.top = face.top;
        objMeta->detector_bbox_info.org_bbox_coords.width = face.width;
        objMeta->detector_bbox_info.org_bbox_coords.height = face.height;

        // Attach to frame with parent object (CLIP_OBJECTS) or null (FULL_FRAME)
        nvds_acquire_meta_lock(batchMeta);
        nvds_add_obj_meta_to_frame(frameMetaList[frameIdx], objMeta, parentObj);
        frameMetaList[frameIdx]->bInferDone = TRUE;
        nvds_release_meta_lock(batchMeta);
    }

    return NVDSINFER_SUCCESS;
}

extern "C" {
dsis::IInferCustomProcessor* CreateInferServerCustomProcess(
    const char* config, uint32_t configLen)
{
    std::cerr << "[scrfd] CreateInferServerCustomProcess() called\n";
    return new NvInferServerCustomProcess();
}
}

Thank you.

Thanks for the sharing! there is a nvinferserver bug. The value “opts->getValueArray(OPTION_NVDS_FRAME_META_LIST, frames)” is wrong when nvinferserver works as sgie. Since nvinferserver is opensource, please add the following patch, then rebuild and replace so according the readme.
In addBatchOptions()
of /opt/nvidia/deepstream/deepstream/sources/gst-plugins/gst-nvinferserver/gstnvinferserver_impl.cpp,
modify
option->setValueArray(OPTION_NVDS_FRAME_META_LIST, objMetaList);
to
option->setValueArray(OPTION_NVDS_OBJ_META_LIST, objMetaList);

After using the patch, the nvinferserver works as sgie with custom detector processor. Here is the complete code and log.
deepstream_test2_app.zip (161.2 KB) 1109.txt (15.6 KB)