dwImageStreamer_initialize invalid device context

Hello

I am modifying DW Sample gmsl camera multiple, trying to setup a streamer from NVmedia to CUDA as follows:

dwImageProperties cameraImageProperties;
result = dwSensorCamera_getImageProperties(&cameraImageProperties, DW_CAMERA_PROCESSED_IMAGE, cameraSensor->sensor);
result = dwImageStreamer_initialize(&nvm2CUDA, &cameraImageProperties, DW_IMAGE_CUDA, sdk);
if (result != DW_SUCCESS) {
	std::cerr << "\n ERROR Initialising stream: "  << dwGetStatusName(result) << std::endl;
		}

It does compile correctly but at execution time I am receiving an error saying

Driveworks exception thrown: DW_CUDA_ERROR: StreamConsumerCUDA: Fail to retreive context at init : invalid device context
 ERROR Initialising stream: DW_CUDA_ERROR

Any idea why this could be?

Regards

Dear dtorres1,
It is very difficult to identify the problem with code snippet you have provided. It looks like you are using same ImageStreamer for all cameras. If so, Can you assign a different stream to each camera. It would be great if you can provide a code snippet ans steps to reproduce the error on our side.

Hello

I am creating a Streamer per camera.

Essentially, I am trying to modify camera_multiple_gmsl to include color correction, for which I am also following colorcorrection sample with no success

See here a minimal version of the modified function threadCameraPipeline of camera_multiple_gmsl, where the Streamer is initialized.

void threadCameraPipeline(Camera* cameraSensor, uint32_t port, dwContextHandle_t sdk, WindowBase* window) {
	dwStatus result;
	int32_t pool_size = 2;
	uint32_t numFramesRGB = pool_size*cameraSensor->numSiblings;
	bool eof;
	dwImageStreamerHandle_t nvm2CUDA = DW_NULL_HANDLE;

	std::vector<dwImageCUDA> frameRGBA;
	{	
		dwImageProperties cameraImageProperties;
		dwSensorCamera_getImageProperties(&cameraImageProperties, DW_CAMERA_PROCESSED_IMAGE, cameraSensor->sensor);
		
		cameraImageProperties.type = DW_IMAGE_CUDA;
		dwImageProperties displayImageProperties = cameraImageProperties;
		displayImageProperties.pxlFormat = DW_IMAGE_RGBA;
		displayImageProperties.planeCount = 1;
		
		dwImageFormatConverter_initialize(&cameraSensor->yuv2rgba, cameraImageProperties.type, sdk); 
		
		dwImageProperties nvm2cudaProps = cameraImageProperties;
		nvm2cudaProps.type = DW_IMAGE_NVMEDIA;
		nvm2cudaProps.pxlFormat = DW_IMAGE_YUV420;
		nvm2cudaProps.pxlType = DW_TYPE_UINT8;
		
		dwImageStreamer_initialize(&nvm2CUDA, &nvm2cudaProps, DW_IMAGE_CUDA, sdk);

		for (uint32_t cameraIdx = 0; cameraIdx < cameraSensor->numSiblings; cameraIdx++) {
			for (int32_t k = 0; k < pool_size; k++) {
				dwImageCUDA rgba{};
				void *dptr   = nullptr;
				size_t pitch = 0;
				cudaMallocPitch( &dptr, &pitch, cameraImageProperties.width * 4, cameraImageProperties.height ); // 4 channels for RGBA
				dwImageCUDA_setFromPitch(&rgba, dptr, cameraImageProperties.width, cameraImageProperties.height, pitch, DW_IMAGE_RGBA);

				frameRGBA.push_back(rgba);
				cameraSensor->rgbaPool.push(&frameRGBA.back());
			}
		} 
		
		g_run = g_run && dwSensor_start(cameraSensor->sensor) == DW_SUCCESS;
		eof = false;
	}
	
	// main loop
	while (g_run) {
		bool eofAny = false;
		{
			if (eof) {
				eofAny = true;
				continue;
			}

			if (cameraSensor->rgbaPool.empty()) {
				std::cerr << "Ran out of RGBA buffers, continuing" << std::endl;
				continue;
			}

			for (uint32_t cameraIdx = 0;  cameraIdx < cameraSensor->numSiblings && !cameraSensor->rgbaPool.empty(); cameraIdx++) {
				
				eof = captureCamera(cameraSensor->rgbaPool.front(),
									cameraSensor->sensor, cameraIdx,
									cameraSensor->yuv2rgba,
									nvm2CUDA );
				g_frameRGBAPtr[port][cameraIdx] = cameraSensor->rgbaPool.front();
				cameraSensor->rgbaPool.pop();

				if (!eof) {
					cameraSensor->rgbaPool.push(g_frameRGBAPtr[port][cameraIdx]);
				}
				eofAny |= eof;
			}
		}
		
		std::this_thread::sleep_for(std::chrono::milliseconds(30));

		g_run = g_run && !eofAny;
	}

	//Clean up
		.......
}

The function captureCamera has been modified to follow a similar flow as in runSingleCameraPipeline from colorcorrection

What I am doing wrong?

Dear dtorres1,
May I know what is your objective? It looks like you are creating a dwCUDAImage pool in this snippet. In the above snippet you mentioned you are using imageStreamer with Nvmedia as producer and CUDA as consumer which is confusing. From the error it is clear that you have some issue with getting context on GPU. Were you able to run other samples which uses GPU?

Hello

The dwCUDAImage pool is to get the frames from captureCamera. I pretend to color correct inside captureCamera, I am not worry about delays there.

Perhaps, a description of captureCamera would help: See below:

dwStatus captureCamera(dwImageCUDA *frameCUDArgba,
			dwSensorHandle_t cameraSensor,
			uint32_t sibling,
			dwImageFormatConverterHandle_t yuv2rgba,
			dwImageStreamerHandle_t nvm2CUDA_)
{
	dwCameraFrameHandle_t frameHandle;
	dwImageNvMedia *frameNVMyuv = nullptr;

	dwStatus result = DW_FAILURE;
	result = dwSensorCamera_readFrame(&frameHandle, sibling, 300000, cameraSensor);
	if (result != DW_SUCCESS) {
		std::cout << "readFrameNvMedia: " << dwGetStatusName(result) << std::endl;
		return result;
	}

	result = dwSensorCamera_getImageNvMedia(&frameNVMyuv, DW_CAMERA_PROCESSED_IMAGE, frameHandle);
	if( result != DW_SUCCESS ){
		std::cout << "readFrameNvMedia: " << dwGetStatusName(result) << std::endl;
	}
	
	dwImageCUDA *frameCUDAyuv = nullptr;
	dwImageStreamer_postNvMedia(frameNVMyuv, nvm2CUDA_);
	result = dwImageStreamer_receiveCUDA(&frameCUDAyuv, 30000, nvm2CUDA_);
    if (result != DW_SUCCESS) {
        std::cerr << "did not received CUDA frame within 30ms" << std::endl;
        return result;
    }
	
	///// Color correct here?
	
	
	///////
	result = dwImageFormatConverter_copyConvertCUDA(frameCUDArgba, frameCUDAyuv, yuv2rgba, 0);
	if (result != DW_SUCCESS) 
	{
		std::cerr << "ERROR converting CUDA format: " << dwGetStatusName(result) << std::endl;
	}
		
	
	dwImageNvMedia *processedNVM;
	result = dwImageStreamer_returnReceivedCUDA(frameCUDAyuv, nvm2CUDA_);
	if(result != DW_SUCCESS)
	{
		std::cerr << "ERROR cannot return CUDA: " <<  dwGetStatusName(result) << std::endl;
	}
	dwImageStreamer_waitPostedNvMedia(&processedNVM, 30000, nvm2CUDA_);			
	
	result = dwSensorCamera_returnFrame(&frameHandle);
	if( result != DW_SUCCESS ){
		std::cout << "copyConvertNvMedia: " << dwGetStatusName(result) << std::endl;
	} 
	return DW_SUCCESS;
}

The process would be: Get Frame in NVMedia->Post to CUDA->ColorCorrect->YUV2RGBA

Yes, samples run normally, even other custom made applications run. E.g. sample_image_streamer_simple runs fine.

Regards

Hello,
Ok. Please check moving the imageStreamer part of each thread. The issue could be main thread is not able to create a separate GPU context associated to each streamer.

Hello

Yes, that was the problem. Thank you

Could you please, indicate (I guess for everyone’s benefit) why, even if passing to a different thread a pointer to the original context (sdk) it is not possible to use it on that thread?

Could you indicate where the definition of context is?

Dear dtorres1,
In the sample you are trying to create 4 streamers from a single thread and assigning one to each camera. But from one thread only one CUDA context can be created and all the CUDA calls gets executed in that context. Please check CUDA context management(https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html). In order to have multiple CUDA contexts, You need to call ImageStreamer creation in each thread seperately.