Calling cuda functions from c source code...

Hi guys…i’m new with cuda and i found it awesome!!! I’m developping some functions that should be called from another big C program. This functions are composed by “floating point operations” so i would like to execute them with cuda but the main program is too big so i can’t rewrite in cuda so i would call the cuda program from the main C program…is that possible???

Thank u and sorry for my English!!! :D

bye bye

Hello, it seems to be possible, I’ve just completed writing simple example for self understanding.(It’s simple nut it works=) )

(It’s based on videos by Sarnath. Great thanks goes to this man External Media http://forums.nvidia.com/index.php?showtop…;hl=create+dll)

At first I’ve create standard DLL project in Visual Studio and added a *.cu file

(In properties for this file “Command line” property was specified according to the Sarnath’s advise as well: nvcc -I"$(CUDA_INC_PATH)" -c -o $(ConfigurationName)\cuda_kernel.obj cuda_kernel.cu)

#include <stdio.h>

__global__ void my_intcopy(int *src, int *dest, int n)

{

	int i=threadIdx.x;

		dest[i] = src[i]*2+1;

}

__declspec(dllexport)

void my_intcopy_caller(int *Source, int *Dest, int n)

{

	printf("INITIALIZING \n");

	dim3 grid(1), block(n);

	int *d_S, *d_D;

	cudaMalloc((void**) &d_S, n*sizeof(int));

	cudaMalloc((void**) &d_D, n*sizeof(int));

	cudaMemcpy(d_S, Source, n*sizeof(int), cudaMemcpyHostToDevice);

	printf("RUNNING \n");

	my_intcopy<<<grid,block>>>(d_S, d_D, n);

	Dest[7]=777;

	printf("DONE \n");

	cudaMemcpy(Dest, d_D, n*sizeof(int), cudaMemcpyDeviceToHost);

	cudaThreadSynchronize();

}

And then I’ve created a console application with following code

External Media

// cuda_app.cpp : Defines the entry point for the console application.

//

#include "stdafx.h"

#include <cuda_runtime_api.h>

#include <cuda.h>

#include <windows.h>

#include <atlconv.h>

int _tmain(int argc, _TCHAR* argv[])

{

	int *h_src, *h_dest;//, *d_src, *d_dest;

	char *c="0";

	int k = 0;

	HINSTANCE LoadMe;

	typedef void (*funcPtr)(int*, int*, int);  

	funcPtr LibFunc; 

	h_src = (int*)malloc(10*sizeof(int));

	h_dest = (int*)malloc(10*sizeof(int));

	for(int i = 0; i<10; i++)

	{

		h_src[i]=i*i;

		h_dest[i]=i;

	}

	

	LoadMe = LoadLibrary("CUDA_DLL.dll");//Here the DLL is loaded

	if (LoadMe != 0)

		printf("LoadMe library loaded!\n");

	else

		printf("LoadMe library failed to load!\n");

	LibFunc = (funcPtr)GetProcAddress(LoadMe, "?my_intcopy_caller@@YAXPAH0H@Z");//Here the function is being loaded

	//I was lazy for creating .def file for DLL, so I've just extracted function name from DLL =)

	k=10;

	LibFunc(h_src, h_dest, k);

	

	for(int i=0; i<10; i++)

	{

		printf("%d",h_src[i]);

		printf(" ");

		printf("%d",h_dest[i]);

		printf("	");

	}

	while(scanf©)

	{}

	return 0;

}

I have to embed a few functions to a large project too, and in such an aproach it seems that I can use CUDA functions without any use of CUDA API in main project External Media . Is it true or some problems (like simultaneous acces to GPU) could appear?