nppiFilterGaussBorder_8u_C1R returns -1000 (NPP_CUDA_KERNEL_EXECUTION_ERROR)

Software Version
DRIVE OS Linux 5.2.6 and DriveWorks 4.0

Hardware Platform
DRIVE AGX Xavier

Hi,

I try to use nppiFilterGaussBorder_8u_C1R, but whatever I do it returns -1000 (NPP_CUDA_KERNEL_EXECUTION_ERROR) and I don’t know what can be the issue.

        cudaStream_t cudaStream;
        if(auto const cudaRes{cudaStreamCreate(&cudaStream)}; cudaRes != cudaSuccess)
        {
            GST_ERROR("Cannot create cuda stream: %d", cudaRes);
        }

        if(nppGetStream() != cudaStream)
        {
            if(auto const setStreamResult{nppSetStream(cudaStream)}; setStreamResult != 0)
            {
                GST_ERROR("nppSetStream error: %d", setStreamResult);
            }
        }
        
        Npp8u* cudaMem = nppsMalloc_8u(256 * 256);
        size_t pitch = 256;

        Npp8u* cudaMemDst = nppsMalloc_8u(256 * 256);
        size_t pitchDst = 256;

        if(cudaMem == nullptr)
        {
            GST_ERROR("nppsMalloc_8u failed");
            throw std::runtime_error("Error malloc");
        }

        if(cudaMemDst == nullptr)
        {
            GST_ERROR("nppsMalloc_8u dst failed");
            throw std::runtime_error("Error malloc dst");
        }

        if(auto const memsetRes{nppsZero_8u(cudaMem, 256 * 256)}; memsetRes != 0)
        {
            GST_ERROR("nppsZero_8u failed %d", memsetRes);
        }

        if(auto const memsetRes{nppsZero_8u(cudaMemDst, 256 * 256)}; memsetRes != 0)
        {
            GST_ERROR("nppsZero_8u dst failed %d", memsetRes);
        }

        if(auto const syncRes{cudaStreamSynchronize(cudaStream)}; syncRes != cudaSuccess)
        {
            GST_ERROR("cudaStreamSynchronize 1 failed %d", syncRes);
        }

        auto const nppiError{nppiFilterGaussBorder_8u_C1R(
            cudaMem + 16 * pitch + 16,
            pitch,
            NppiSize{256, 256},
            NppiPoint{16, 16},
            cudaMemDst + 16 * pitchDst + 16,
            pitchDst,
            NppiSize{256 - (2 * 16), 256 - (2 * 16)},
            NPP_MASK_SIZE_3_X_3, 
            NPP_BORDER_REPLICATE
        )};

        if(nppiError != 0)
        {
            GST_ERROR("nppiFilterGaussBorder_8u_C1R failed: %d", nppiError);
        }

        if(auto const syncRes{cudaStreamSynchronize(cudaStream)}; syncRes != cudaSuccess)
        {
            GST_ERROR("[BlurRectDrawer] cudaStreamSynchronize 3 failed %d", syncRes);
        }

        nppsFree(cudaMem);
        nppsFree(cudaMemDst);
        cudaStreamDestroy(cudaStream);

Could you please help figuring out the problem?

This forum is intended for developers who are part of the NVIDIA DRIVE™ AGX SDK Developer Program. To access the forum, please make sure to use an account associated with your corporate or university email address. Thanks.

@VickNV Do I need to re-post my question, or it is OK like this?

It’s okay like this. I’ll check your question with our team and get back to you. Thanks.

1 Like

Have you tried running any available CUDA samples using NPP on your system? It may be helpful to narrow down the issue and determine if it’s specific to your implementation or if there’s an issue with other things.

@VickNV Hmm, I took cuda-samples/Samples/boxFilterNPP at v10.2 · NVIDIA/cuda-samples · GitHub and it ran well.
Then I modified it to use nppiFilterGaussBorder_8u_C1R and it ran without any issue.
Let me check why it fails in our application code and runs well as a standalone application. Thanks for suggesting this test!

1 Like

Glad to hear that the modified sample code ran without any issues.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.