Compressed Marker Labels Info returns always -1000 (NPP_CUDA_KERNEL_EXECUTION_ERROR)

Hi,
i’m using the (up to date) sample code from GitHub and implemented the function nppiCompressedMarkerLabelsUFInfo_32u_C1R_Ctx at 4_CUDA_Libraries / batchedLabelMarkersAndLabelCompressionNPP / batchedLabelMarkersAndLabelCompressionNPP.cpp as pasted below.
This code snipped is placed right after the last cudaStreamSynchronize in the images loop at line 484.

While the results seems to be ok (nMarkerLabelPixelCount and oMarkerLabelBoundingBox) the function always returns NPP_CUDA_KERNEL_EXECUTION_ERROR (-1000).

When using the same code in my own test application, the same error is returned, but also the results are zero. But here I will have to do some more checks for my own before posting them.

Anyone worked with this functions already? I dont think this return value is OK, is somethng missing or am I doing something wrong?

I would really appreciate to have some sample code at GitHub for this functions, especially for the optional data which is NULL at my sample code (I guess I will need them later in my project)

Here is the code:


      unsigned int nInfoListSize = 0;
      nppStatus = nppiCompressedMarkerLabelsUFGetInfoListSize_32u_C1R(nCompressedLabelCount, &nInfoListSize);

      NppiCompressedMarkerLabelsInfo* pMarkerLabelsInfoList, * pMarkerLabelsInfoListHost;
      cudaError = cudaMalloc((void**)&pMarkerLabelsInfoList, nInfoListSize);
      cudaError = cudaMallocHost((void**)&pMarkerLabelsInfoListHost, nInfoListSize);

      nppStatus = nppiCompressedMarkerLabelsUFInfo_32u_C1R_Ctx(
                      pUFLabelDev[nImage], oSizeROI[nImage].width * sizeof(Npp32u), oSizeROI[nImage],
                      nCompressedLabelCount, pMarkerLabelsInfoList,
                      NULL, 0,
                      NULL, 0,
                      NULL,
                      NULL,
                      NULL,
                      NULL,
                      NULL,
                      nppStreamCtx);

      memset(pMarkerLabelsInfoListHost, 0, nInfoListSize);

      cudaError = cudaMemcpy(
          pMarkerLabelsInfoListHost,
          pMarkerLabelsInfoList,
          nInfoListSize,
          cudaMemcpyDeviceToHost);

     if (pMarkerLabelsInfoList != 0) cudaFree(pMarkerLabelsInfoList);
     if (pMarkerLabelsInfoListHost != 0) cudaFreeHost(pMarkerLabelsInfoListHost);

Many thanks in advance,
Manfred

Im working on Win 10, VisualStudio 2022, CUDA Version 11.6, Quadro T2000, Driver Version 511.23

batchedLabelMarkersAndLabelCompressionNPP.cpp (34.7 KB)

Hi,
by printing the results of the example images I found further things which seems not to be correct.
This is the output of the skull image (CT_skull_512x512_8u.raw):
I marked some of the lines with unusual results (but there are many more of them).

Rect #  0 :  PixelCount:   2644  @  BoundingBox.x    0, y    0, width  479, height  503
 Rect #  1 :  PixelCount: 174681  @  BoundingBox.x    1, y    4, width  511, height  511
 **Rect #  2 :  PixelCount:      4  @  BoundingBox.x   40, y    5, width   40, height    5** -> with and height 40*5, but only 4 pixels?
 Rect #  3 :  PixelCount:      2  @  BoundingBox.x   16, y    6, width   16, height    6
 Rect #  4 :  PixelCount:  40446  @  BoundingBox.x   57, y    7, width  455, height  454
 **Rect #  5 :  PixelCount:      3  @  BoundingBox.x   57, y   24, width    0, height    0** -> w/h 0?
 Rect #  6 :  PixelCount:    195  @  BoundingBox.x  313, y   17, width  371, height   38
 Rect #  7 :  PixelCount:      2  @  BoundingBox.x  261, y   25, width  261, height   13
 Rect #  8 :  PixelCount:    123  @  BoundingBox.x  270, y   16, width  287, height   18
 **Rect #  9 :  PixelCount:      2  @  BoundingBox.x  611, y   32, width    0, height    0** -> x = 611, but image is 512x512?
 **Rect # 10 :  PixelCount:     25  @  BoundingBox.x  245, y   18, width    0, height    0** 25 pixels, but w/h zero?
 Rect # 11 :  PixelCount:     59  @  BoundingBox.x  215, y   23, width    0, height   27
 Rect # 12 :  PixelCount:     34  @  BoundingBox.x  193, y   29, width  207, height   35
 Rect # 13 :  PixelCount:    510  @  BoundingBox.x  236, y   29, width  295, height   39
 Rect # 14 :  PixelCount:     18  @  BoundingBox.x  178, y   35, width  182, height   37
 Rect # 15 :  PixelCount:      2  @  BoundingBox.x  377, y   70, width    0, height    0
 Rect # 16 :  PixelCount:     20  @  BoundingBox.x  169, y   40, width  176, height   40
 Rect # 17 :  PixelCount:     92  @  BoundingBox.x  211, y   42, width  232, height   45
 Rect # 18 :  PixelCount:    213  @  BoundingBox.x  327, y 5839, width  351, height   51
 Rect # 19 :  PixelCount:      6  @  BoundingBox.x  288, y  252, width  289, height   43
 Rect # 20 :  PixelCount:  30868  @  BoundingBox.x  127, y   45, width  425, height  350
 **Rect # 21 :  PixelCount:     10  @  BoundingBox.x 1616, y   44, width    0, height    0**
 Rect # 22 :  PixelCount:      1  @  BoundingBox.x  157, y   44, width    0, height    0
 Rect # 23 :  PixelCount:     15  @  BoundingBox.x  197, y   44, width  201, height   46
 Rect # 24 :  PixelCount:      1  @  BoundingBox.x  207, y   44, width    0, height    0
 Rect # 25 :  PixelCount:      1  @  BoundingBox.x  266, y   44, width    0, height    0
 Rect # 26 :  PixelCount:     22  @  BoundingBox.x  148, y   49, width  154, height   51
 Rect # 27 :  PixelCount:      4  @  BoundingBox.x  918, y   47, width  306, height   50
 **Rect # 28 :  PixelCount:     12  @  BoundingBox.x 3153, y   52, width  265, height    0**
 Rect # 29 :  PixelCount:      2  @  BoundingBox.x  283, y  108, width    0, height    0
 Rect # 30 :  PixelCount:     10  @  BoundingBox.x  263, y   56, width  266, height   59
 **Rect # 31 :  PixelCount:      2  @  BoundingBox.x  337, y   57, width  337, height   57** -> x 337, width 337 -> width = 512?
 Rect # 32 :  PixelCount:      2  @  BoundingBox.x  520, y   60, width  260, height   60
 Rect # 33 :  PixelCount:     17  @  BoundingBox.x  348, y   65, width  351, height   65
... and so on...

For me, this seems to be an synchronizing issue. But right before writing this lines, I’m calling
cudaStreamSynchronize(nppStreamCtx.hStream);

Here the source code which i used to print this lines

      if (nImage == 1)
      {
          for (unsigned int l = 0; l < nCompressedLabelCount; l++)
          {
              printf(" Rect # %2d :  PixelCount: %6d  @  BoundingBox.x %4d, y %4d, width %4d, height %4d\n",
                  l,
                  pMarkerLabelsInfoListHost[l].nMarkerLabelPixelCount,
                  pMarkerLabelsInfoListHost[l].oMarkerLabelBoundingBox.x,
                  pMarkerLabelsInfoListHost[l].oMarkerLabelBoundingBox.y,
                  pMarkerLabelsInfoListHost[l].oMarkerLabelBoundingBox.width,
                  pMarkerLabelsInfoListHost[l].oMarkerLabelBoundingBox.height);
          }
      }

Can someone please give me some feedback about that? Am I doing something wrong?

You might be hitting a windows kernel timeout.

I don’t have any trouble running the sample code at

7_CUDALibraries/batchedLabelMarkersAndLabelCompressionNPP

on a linux machine with no kernel timeout

Thanks,

do you know what to do when its getting to a kernel timeout?

Did you also checked the results in pMarkerLabelsInfoListHost ? Are they also correct on Linux?

Thanks,
Manfred

kernel timeout on windows: Timeout Detection & Recovery (TDR)

I didn’t check any results. Just ran the app under compute-sanitizer, no runtime errors reported.

I tested the same (patched) example on another Win10 Pro 64 Bit machine with an NVIDIA RTX A6000 GPU and also getting the same results. Compiled this time with VS2017 instead of VS2022.

Time measurements at every call of the function
nppiCompressedMarkerLabelsUFInfo_32u_C1R_Ctx
showed me about 2 milliseconds per call.
Therefore I cannot believe that this can lead to an kernel timeout.

Another detail: when using the pContoursImage option (Parameter #6 and #7), the resulting contours seems to be correct (the -1000 error is still there and the rectangels are alos corrupt). But with this test, pContoursImage was null.

Does anyone already used the function nppiCompressedMarkerLabelsUFInfo_32u_C1R_Ctx and can give me some input about that? Maybe is there some sample code available?

GPU Device 0: “NVIDIA RTX A6000” with compute capability 8.6
NPP Library Version 11.6.0
CUDA Driver Version: 11.6
CUDA Runtime Version: 11.6

Add internal ticket conclusion here .
This API demonstrator is CUDALibrarySamples/NPP/findContour at master · NVIDIA/CUDALibrarySamples · GitHub