Need help - Creating a CUDA C++ .dll

Dear form members,

I’m working on a project where i need to filter the green screen from images. I want to use CUDA to filter the green screen from the image.

I use C# to get the data from the image(a byte array representing the color of the pixels). Because of this I need to create a .dll that is able to use CUDA to filter the green screen from the image.

So far so good, but here comes my problem:

I created a .dll in visual studio 2012. This .dll has a .cu file with a “__declspec(dllexport)” method that will filter the green screen from the image for you. The .cu file looks like this:

#include <iostream> 
#include <vector> 
#include <cuda_runtime.h> 
#include "stdafx.h"
 
#pragma comment(lib, "cudart") 
 
using std::cerr; 
using std::cout; 
using std::endl; 
using std::exception; 
using std::vector; 
 
const int idealThreadsPerBlock = 256;

__global__ void ChromakeyKernel(char *targetRgb, char *resultArgb, int widthImage, int filterCr, int filterCb, int squaredTollaranceA, int squaredTollaranceB)
{
	
	int callID = (((blockIdx.y * blockDim.y) + threadIdx.y) * widthImage) + ((blockIdx.x * blockDim.x) + threadIdx.x);

	int index = callID * 3; // Get correct index
	int resultIndex = callID * 4; // Get result index;

	int b = targetRgb[index];
	int g = targetRgb[index + 1];
	int r = targetRgb[index + 2];
	
	//Convert the pixel color from the RGB color space to the YCbCr(hsl) color space
	float currentCb = 128 + -0.168736 * r - 0.331264 * g + 0.5 * b;
	float currentCr = 128 + 0.5 * r - 0.418688 * g - 0.081312 * b;   

	float A = filterCb - currentCb;
	float B = filterCr - currentCr;
	float C = (A * A) + (B * B);
	
	if(squaredTollaranceA <= C && C <= squaredTollaranceB)
	{
		resultArgb[resultIndex] = 0;
		resultArgb[resultIndex + 1] = 1;
		resultArgb[resultIndex + 2] = 2;
		resultArgb[resultIndex + 3] = 3;
	}
	else
	{
		resultArgb[resultIndex] = b;
		resultArgb[resultIndex + 1] = g;
		resultArgb[resultIndex + 2] = r;
		resultArgb[resultIndex + 3] = 255;
	}
}

extern "C" int __declspec(dllexport) ChromakeyCUDA(char *targetRgb, int bytesTarget, char *resultArgb, int bytesResult, int widthImage, int heightImage, int filterCr, int filterCb, int TollaranceA, int TollaranceB)
{

	int squaredTollaranceA = TollaranceA * TollaranceA;
	int squaredTollaranceB = TollaranceB * TollaranceB;

	char *d_targetRgb;
	int sizeTarget = bytesTarget * sizeof(char);
	char *d_resultArgb;
	int sizeResult = bytesResult * sizeof(char);

	//// Create GPU memory
	cudaMalloc((void**)&d_targetRgb, sizeTarget);
	cudaMalloc((void**)&d_resultArgb, sizeResult);

	// Copy CPU memory to GPU memory
	//cudaMemcpy(d_targetRgb, targetRgb, sizeTarget, cudaMemcpyHostToDevice);

	// Create 2d blocks
	dim3 threadsPerBlock(8, 8);
	dim3 numBlocks(widthImage/threadsPerBlock.x,  /* for instance 512/8 = 64*/
              heightImage/threadsPerBlock.y); 
	//dim3 numBlocks(heightImage, widthImage / idealThreadsPerBlock);

	// Call the kernel method
	filterCr = 11;
	ChromakeyKernel<<<numBlocks, threadsPerBlock>>>(d_targetRgb, d_resultArgb, widthImage, filterCr, filterCb, squaredTollaranceA, squaredTollaranceB);
	filterCr = 222;
	//ChromakeyKernel<<<numBlocks, threadsPerBlock>>>(d_targetRgb, d_resultArgb, widthImage, filterCr, filterCb, squaredTollaranceA, squaredTollaranceB);
	
	// Copy the result from GPU memory to CPU mem
	cudaMemcpy(resultArgb, d_resultArgb, sizeResult, cudaMemcpyDeviceToHost);  

	// Free the GPU memory
	cudaFree(d_resultArgb);
	cudaFree(d_targetRgb);

	return filterCr;
}

This project builds succesfully. But when i use this method in C#, by using the .dll, the byte values of every byte in the result array is 0.
This means that kernel method just doesn’t get executed.

I’m sure the .dll works because when i make the ChromakeyCUDA method return a int value the C# application receives that int as return value when the application calls the method.

I’m sure this code works yoo, so that isn’t the problem either.

Sorry i copied my test code.

The kernel call isn’t correct:

It has to be

ChromakeyKernel<<<numBlocks, threadsPerBlock>>>(d_targetRgb, d_resultArgb, widthImage, filterCr, filterCb, squaredTollaranceA, squaredTollaranceB);

For some reason the edit doesn’t work for me.
But thanks for reading, i hope you are able to help me.

Kind regards, Sam Brands

It’s an old post, but i was looking for chroma solutions and stumbled on this. Uncomment line 68, and it’ll work fine.

Regards, Michel.