Calling cuda functions from c source code...

Hi guys…i’m new with cuda and i found it awesome!!! I’m developping some functions that should be called from another big C program. This functions are composed by “floating point operations” so i would like to execute them with cuda but the main program is too big so i can’t rewrite in cuda so i would call the cuda program from the main C program…is that possible???

Thank u and sorry for my English!!! :D

bye bye

Hello, it seems to be possible, I’ve just completed writing simple example for self understanding.(It’s simple nut it works=) )

(It’s based on videos by Sarnath. Great thanks goes to this man :turned:…;hl=create+dll)

At first I’ve create standard DLL project in Visual Studio and added a *.cu file

(In properties for this file “Command line” property was specified according to the Sarnath’s advise as well: nvcc -I"(CUDA_INC_PATH)" -c -o (ConfigurationName)\cuda_kernel.obj

#include <stdio.h>

__global__ void my_intcopy(int *src, int *dest, int n)


	int i=threadIdx.x;

		dest[i] = src[i]*2+1;



void my_intcopy_caller(int *Source, int *Dest, int n)


	printf("INITIALIZING \n");

	dim3 grid(1), block(n);

	int *d_S, *d_D;

	cudaMalloc((void**) &d_S, n*sizeof(int));

	cudaMalloc((void**) &d_D, n*sizeof(int));

	cudaMemcpy(d_S, Source, n*sizeof(int), cudaMemcpyHostToDevice);

	printf("RUNNING \n");

	my_intcopy<<<grid,block>>>(d_S, d_D, n);


	printf("DONE \n");

	cudaMemcpy(Dest, d_D, n*sizeof(int), cudaMemcpyDeviceToHost);



And then I’ve created a console application with following code


// cuda_app.cpp : Defines the entry point for the console application.


#include "stdafx.h"

#include <cuda_runtime_api.h>

#include <cuda.h>

#include <windows.h>

#include <atlconv.h>

int _tmain(int argc, _TCHAR* argv[])


	int *h_src, *h_dest;//, *d_src, *d_dest;

	char *c="0";

	int k = 0;


	typedef void (*funcPtr)(int*, int*, int);  

	funcPtr LibFunc; 

	h_src = (int*)malloc(10*sizeof(int));

	h_dest = (int*)malloc(10*sizeof(int));

	for(int i = 0; i<10; i++)






	LoadMe = LoadLibrary("CUDA_DLL.dll");//Here the DLL is loaded

	if (LoadMe != 0)

		printf("LoadMe library loaded!\n");


		printf("LoadMe library failed to load!\n");

	LibFunc = (funcPtr)GetProcAddress(LoadMe, "?my_intcopy_caller@@YAXPAH0H@Z");//Here the function is being loaded

	//I was lazy for creating .def file for DLL, so I've just extracted function name from DLL =)


	LibFunc(h_src, h_dest, k);


	for(int i=0; i<10; i++)



		printf(" ");


		printf("	");




	return 0;


I have to embed a few functions to a large project too, and in such an aproach it seems that I can use CUDA functions without any use of CUDA API in main project :teehee: . Is it true or some problems (like simultaneous acces to GPU) could appear?