VPI error: cuda device unintialized

I have my program based on your CudaBayerDemosaic Argus example where argus stream is produced in the main thread while processing is running in another thread based on your Thread class in Argus/Samples/Utils. While this is working good in general I want to use VPI now for some processing. When using VPI the ouput buffer of vpi is not filled and I get an error message (when closing the program, not when the actual calculation function is called).

error message: cuda device uninitialized.

If you could help me out here would be wonderful!

Here are some important parts of my code:

 #define CHECK_STATUS(STMT)                                    \
 do                                                        \
 {                                                         \
 VPIStatus status = (STMT);                            \
 if (status != VPI_SUCCESS)                            \
 {                                                     \
 char buffer[VPI_MAX_STATUS_MESSAGE_LENGTH];       \
 vpiGetLastStatusMessage(buffer, sizeof(buffer));  \
 std::ostringstream ss;                            \
 ss << vpiStatusGetName(status) << ": " << buffer; \
 throw std::runtime_error(ss.str());               \
 }                                                     \
 } while (0);

bool ICGGreyConsumer::threadInitialize()


    vpiContextCreateCudaContextWrapper(0, m_cudaContext, &vpi_context);


bool ICGGreyConsumer::threadExecute()

     if(frame_count == 0){
               cudaMallocPitch ( &tmpBuffer, &tmpPitch, (size_t)bayerEglFrame.width*2, (size_t)bayerEglFrame.height);
               input_plane.data = bayerEglFrame.frame.pPitch[0];
               input_plane.height = bayerEglFrame.height;
               input_plane.width = bayerEglFrame.width;
               input_plane.pitchBytes = bayerEglFrame.pitch;
               input_plane.pixelType = VPI_PIXEL_TYPE_U16;

               input_data.numPlanes = 1;
               input_data.format = VPI_IMAGE_FORMAT_U16;
               input_data.planes[0] = input_plane;

               //input and output share the same height/width/pitch/format

               output_plane.data = tmpBuffer;
               output_plane.height = bayerEglFrame.height;
               output_plane.width = bayerEglFrame.width;
               output_plane.pitchBytes = bayerEglFrame.pitch;
               output_plane.pixelType = VPI_PIXEL_TYPE_U16;

               output_data.numPlanes = 1;
               output_data.format = VPI_IMAGE_FORMAT_U16;
               output_data.planes[0] = output_plane;

               CHECK_STATUS(vpiImageCreateCUDAMemWrapper(&input_data, 0, &input));
               CHECK_STATUS(vpiImageCreateCUDAMemWrapper(&output_data, 0, &output));

     input_data.planes[0].data = bayerEglFrame.frame.pPitch[0];
     CHECK_STATUS(vpiImageSetWrappedCUDAMem(input, &input_data));
     CHECK_STATUS(vpiSubmitBilateralFilter(stream, VPI_BACKEND_CUDA, input, output, 7, 50, 1.7, VPI_BORDER_ZERO));


bool ICGGreyConsumer::threadShutdown()




May I know the error is returned by the vpiImageCreateCUDAMemWrapper or vpiSubmitBilateralFilter?
More, would you mind checking if the implementation works in the main thread?


for vpiImageCreateCUDAMemWrapper or vpiSubmitBilateralFilter I get VPI_SUCCES returned. The problem is that the output buffer is not filled.

When closing the whole program, I get this error:

[WARN ] 2022-01-05 13:11:57 (cudaErrorDeviceUninitialized)
[ERROR] 2022-01-05 13:11:57 Error destroying cuda device: VPI_ERROR_INTERNAL: (cudaErrorDeviceUninitialized)

I can try to use vpi from main thread but I would have to change too much, it would not be a valid test…


Could you share a simple source to reproduce this?
So we can check this issue for you?


I will send you code as soon as possible.


Have you sent out the reproducible sample?