How can I compile CUDA code then link it to a C++/CLR project

I am trying to add a CUDA file to my existing C++ Visual Studio 2013 project. I have the CUDA 8.0 SDK installed, I have created a new .cu file, but I can NOT set its Item Type to CUDA/C++ in the CUDA file properties. by compling the project, I’ve got a lot of errors like this:

.NETFramework,Version=v4.5.1.AssemblyAttributes.cpp
2>GpuManagerSdkWrapper.obj : error LNK2028: Nicht aufgelöstes Token (0A00002A) ““public: int __cdecl CudaWrapper::mycudafunction(void)” (?mycudafunction@CudaWrapper@@$$FQEAAHXZ)”, auf das in Funktion ““public: int __cdecl diondo_GpuManager::CudaManager::test(void)” (?test@CudaManager@diondo_GpuManager@@$$FQEAAHXZ)” verwiesen wird.
2>GpuManagerSdkWrapper.obj : error LNK2028: Nicht aufgelöstes Token (0A00002B) ““public: __cdecl CudaWrapper::CudaWrapper(void)” (??0CudaWrapper@@$$FQEAA@XZ)”, auf das in Funktion ““public: int __cdecl diondo_GpuManager::CudaManager::test(void)” (?test@CudaManager@diondo_GpuManager@@$$FQEAAHXZ)” verwiesen wird.
2>GpuManagerSdkWrapper.obj : error LNK2019: Verweis auf nicht aufgelöstes externes Symbol ““public: __cdecl CudaWrapper::CudaWrapper(void)” (??0CudaWrapper@@$$FQEAA@XZ)” in Funktion ““public: int __cdecl diondo_GpuManager::CudaManager::test(void)” (?test@CudaManager@diondo_GpuManager@@$$FQEAAHXZ)”.
2>GpuManagerSdkWrapper.obj : error LNK2019: Verweis auf nicht aufgelöstes externes Symbol ““public: int __cdecl CudaWrapper::mycudafunction(void)” (?mycudafunction@CudaWrapper@@$$FQEAAHXZ)” in Funktion ““public: int __cdecl diondo_GpuManager::CudaManager::test(void)” (?test@CudaManager@diondo_GpuManager@@$$FQEAAHXZ)”.
2>E:\Transfer\La\CT-Software\diondo-CT-Program\GUI\WpfGui\x64\Release\GpuManagerSdkWrapper.dll : fatal error LNK1120: 4 nicht aufgelöste Externe
========== Alles neu erstellen: 1 erfolgreich, 1 fehlerhaft, 0 übersprungen ==========

My simple code looks like this:

the Header File

//+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
//                           CudaHeader.h
//+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-----------

#ifndef _MyCudaWrapper_H_
#define _MyCudaWrapper_H_

extern "C"
{
#include <string.h>
#include <math.h>
}

extern "C" {

	/*CUDA prototype, can be changed to "cudaadvance" or the like*/
	int myfunction_(void);
}

class MyCudaWrapper
{

public:
	MyCudaWrapper(); /* Initialized with defaults setup */
	~MyCudaWrapper(); /* Free memory */

	int Mycudafunction(void);
private:

};

#endif

//+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
//                           CudaWrapper.cu
//+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-----------

include "stdafx.h"
#include "CudaWrapper.h"
#include <stdio.h>
#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include <stdio.h>
//
#include <iostream>  
#include <fstream>   

using namespace std;

// Includes for IntelliSense
#define _SIZE_T_DEFINED
#ifndef __CUDACC__
#define __CUDACC__
#endif
#ifndef __cplusplus
#define __cplusplus
#endif

/********************/
/* CUDA ERROR CHECK */
/********************/
#define gpuErrchk(ans) { gpuAssert((ans), __FILE__, __LINE__); }
inline void gpuAssert(cudaError_t code, char *file, int line, bool abort = true)
{
	if (code != cudaSuccess)
	{
		fprintf(stderr, "GPUassert: %s %s %d\n", cudaGetErrorString(code), file, line);
		if (abort) exit(code);
	}
}
__global__ void Mykernel(void) 
{

}
extern "C" int myfunction_(void)
{
	Mykernel << <1, 1 >> >();
	//printf("Hello, World!\n");
	return 3;
}
CudaWrapper::CudaWrapper()
{

}

int CudaWrapper::mycudafunction()
{
	return myfunction_();
}

What does it mean by unresolved externals? and What is the right way too Integrating CUDA into a C++ application to use existing C++ class

Thanks in advance!
AminLadal

is the option: ‘-Xcompiler -clr’ surely specified on nvcc?

C++/CLI version of familiar addWithCuda:

AddWithCuda.cpp

#include "stdafx.h"

using namespace System;

#include "cuda_runtime.h"

cudaError_t addWithCuda(int *c, const int *a, const int *b, unsigned int size);

int main(array<System::String ^> ^args) {
    Console::WriteLine(L"Hello World");

    const int arraySize = 5;
    const int a[arraySize] = { 1, 2, 3, 4, 5 };
    const int b[arraySize] = { 10, 20, 30, 40, 50 };
    int c[arraySize] = { 0 };

    // Add vectors in parallel.
    cudaError_t cudaStatus = addWithCuda(c, a, b, arraySize);
    if (cudaStatus != cudaSuccess) {
        Console::Error->WriteLine(L"addWithCuda failed!");
        return 1;
    }

    Console::WriteLine(L"[1,2,3,4,5] + [10,20,30,40,50] = {0},{1},{2},{3},{4}\n",
        c[0], c[1], c[2], c[3], c[4]);

    // cudaDeviceReset must be called before exiting in order for profiling and
    // tracing tools such as Nsight and Visual Profiler to show complete traces.
    cudaStatus = cudaDeviceReset();
    if (cudaStatus != cudaSuccess) {
        Console::Error->WriteLine(L"cudaDeviceReset failed!"); 
        return 1;
    }

    return 0;
}

// Helper function for using CUDA to add vectors in parallel.
cudaError_t addWithCuda(int *c, const int *a, const int *b, unsigned int size) {
    extern void addKernel(int* c, const int* a, const int* b);
    int *dev_a = 0;
    int *dev_b = 0;
    int *dev_c = 0;
    cudaError_t cudaStatus;

    // Choose which GPU to run on, change this on a multi-GPU system.
    cudaStatus = cudaSetDevice(0);
    if (cudaStatus != cudaSuccess) {
        Console::Error->WriteLine(L"cudaSetDevice failed!  Do you have a CUDA-capable GPU installed?");
        goto Error;
    }

    // Allocate GPU buffers for three vectors (two input, one output)    .
    cudaStatus = cudaMalloc((void**)&dev_c, size * sizeof(int));
    if (cudaStatus != cudaSuccess) {
        Console::Error->WriteLine(L"cudaMalloc failed!");
        goto Error;
    }

    cudaStatus = cudaMalloc((void**)&dev_a, size * sizeof(int));
    if (cudaStatus != cudaSuccess) {
        Console::Error->WriteLine(L"cudaMalloc failed!");
        goto Error;
    }

    cudaStatus = cudaMalloc((void**)&dev_b, size * sizeof(int));
    if (cudaStatus != cudaSuccess) {
        Console::Error->WriteLine(L"cudaMalloc failed!");
        goto Error;
    }

    // Copy input vectors from host memory to GPU buffers.
    cudaStatus = cudaMemcpy(dev_a, a, size * sizeof(int), cudaMemcpyHostToDevice);
    if (cudaStatus != cudaSuccess) {
        Console::Error->WriteLine(L"cudaMemcpy failed!");
        goto Error;
    }

    cudaStatus = cudaMemcpy(dev_b, b, size * sizeof(int), cudaMemcpyHostToDevice);
    if (cudaStatus != cudaSuccess) {
        Console::Error->WriteLine(L"cudaMemcpy failed!");
        goto Error;
    }

    // Launch a kernel on the GPU with one thread for each element.
    //  replace addKernel<<<1, size>>>(dev_c, dev_a, dev_b); with:
    void* args[] = { &dev_c, &dev_a, &dev_b };
    cudaStatus =cudaLaunchKernel(
      (const void*)&addKernel, // pointer to kernel func.
                      dim3(1), // grid
                   dim3(size), // block
                         args  // arguments
      );

    // Check for any errors launching the kernel
    if (cudaStatus != cudaSuccess) {
        Console::Error->WriteLine(L"addKernel launch failed: {0}\n", gcnew String(cudaGetErrorString(cudaStatus)));
        goto Error;
    }
    
    // cudaDeviceSynchronize waits for the kernel to finish, and returns
    // any errors encountered during the launch.
    cudaStatus = cudaDeviceSynchronize();
    if (cudaStatus != cudaSuccess) {
        Console::Error->WriteLine(L"cudaDeviceSynchronize returned error code {0} after launching addKernel!\n", (int)cudaStatus);
        goto Error;
    }

    // Copy output vector from GPU buffer to host memory.
    cudaStatus = cudaMemcpy(c, dev_c, size * sizeof(int), cudaMemcpyDeviceToHost);
    if (cudaStatus != cudaSuccess) {
        Console::Error->WriteLine(L"cudaMemcpy failed!");
        goto Error;
    }

Error:
    cudaFree(dev_c);
    cudaFree(dev_a);
    cudaFree(dev_b);
    
    return cudaStatus;
}

and addKernel.cu:

#include "device_launch_parameters.h"

__global__ void addKernel(int *c, const int *a, const int *b)
{
    int i = threadIdx.x;
    c[i] = a[i] + b[i];
}

NOTICE: nvcc can’t understand C++/CLI grammer,
also VC++ can’t understand kernel<<<grid,block>>>()
so I replace addKernel<<<…>>> with cudaLaunchKernel() API.

thank you very much,it works, Exactly what I need, thanks again !!

I have an error in start of a kernel.
I can’t understand how to launch function “cudaLaunchKernel () if at me:” __ global __ void addKernel (int A1, int A2, int A3, int A4, int A5, int A6, int y, double * FX)".

Launched a kernel so: “addKernel <<<(N+127)/128, 128>>>(A1, A2, A3, A4, A5, A6, y, dev_FX);”,
where: double * dev_FX=NULL; double FX[N]; #define N (11*105).

Though looks so:
void * args = { both dev_c, and dev_a, and dev_b };
cudaStatus =cudaLaunchKernel (
(const void *)&addKernel,//pointer to kernel func.
dim3(1),//grid
dim3(size),//block
args//arguments

Parameters are necessary: void ** args, sitz_t sharedMem=0U, cudaStream_t=stream(cudaStream_t)0)??? What they will be in my case?

I do so: cudaStatus=cudaLaunchKernel ((const void *)&addKernel, (N+127)/128, 128…);
I don’t know further

// foo.cpp
#include <cuda_runtime.h>
void addKernel (int A1, int A2, int A3, int A4, int A5, int A6, int y, double * FX);

int main() {
  int A1 = 1;
  int A2 = 2;
  int A3 = 3;
  int A4 = 4;
  int A5 = 5;
  int A6 = 6;
  int y  = 7;
  double* FX = nullptr;
  void* args[] = { &A1, &A2, &A3, &A4, &A5, &A6, &y, &FX };
  cudaLaunchKernel(
        (const void*)&addKernel, // pointer to kernel func.
                      dim3(1), // grid
                      dim3(1), // block
                         args  // arguments
      );
  cudaDeviceReset();
}
// kernel.cu
#include <device_launch_parameters.h>
#include <stdio.h>

__global__ void addKernel (int A1, int A2, int A3, int A4, int A5, int A6, int y, double * FX) {
  printf("addKernel(%d,%d,%d,%d,%d,%d,%d,%p)\n",
         A1, A2, A3, A4, A5, A6, y, FX);
}

on console:

d:\work>nvcc -Xcompiler -wd4819 foo.cpp kernel.cu
nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
foo.cpp
kernel.cu
   ライブラリ a.lib とオブジェクト a.exp を作成中

d:\work>a.exe
addKernel(1,2,3,4,5,6,7,0000000000000000)

d:\work>

…any problem?

  1. I don’t know that to do with “on console:” This is the editor of parameters of the compiler?
  2. Why to use function “printf (“addKernel (%d, %d, %d, %d, %d, %d, %d, %p) \n”,
    A1, A2, A3, A4, A5, A6, y, FX);”, if the console output isn’t necessary to me?
  3. I need to write correctly parameters in the cudaLaunchKernel function. What parameters to write in it in my program?

Maybe references to the step-by-step tutorial are had (to create the interface for the CUDA program)?

Earlier I only programmed on C++(the console and GUI) and CUDA C++(console) in the Visual Studio 2012. I don’t know the professional in thin settings of compilers, etc.

  1. I don’t know that to do with “on console:” This is the editor of parameters of the compiler?

you will find “VS20xx x64 Native Tools Command Prompt” on /Common7/Tools/Shortcuts
I used console only for demonstrate how to call kernel function from host.
of course you should build your app. on ‘IDE’.

  1. Why to use function “printf (“addKernel (%d, %d, %d, %d, %d, %d, %d, %p) \n”,
    A1, A2, A3, A4, A5, A6, y, FX);”, if the console output isn’t necessary to me?

to verify all parameters surely passed to global func.

  1. I need to write correctly parameters in the cudaLaunchKernel function. What parameters to write in it in my program?

modify * below as you want.

int A1 = *;
  int A2 = *;
  int A3 = *;
  int A4 = *;
  int A5 = *;
  int A6 = *;
  int y  = *;
  double* FX = *;
  void* args[] = { &A1, &A2, &A3, &A4, &A5, &A6, &y, &FX };
  cudaLaunchKernel(
        (const void*)&addKernel, // pointer to kernel func.
                    dim3(*,*,*), // grid
                    dim3(*,*,*), // block
                          args  // arguments

        );

What to write to the “args” parameter? What parameters are necessary?
All my code such:
// подключаем библиотеки
#include “cuda_runtime.h”
#define _USE_MATH_DEFINES
#include “device_launch_parameters.h”
#include
#include
#include <math.h>
#include <driver_types.h>
#include “book.h”
#include <stdio.h>

#define N (101)

device double fD (int a1, int a2, int a3, int a4, int d) // функция расчета параметров распределения
{
const int d1=20;
const int d2=40;
const int d3=60;
const int d4=80;
if (d<=d1) return a1/100.0; else
if (d>d1 && d<=d2) return a2/100.0; else
if (d>d2 && d<=d3) return a3/100.0; else
if (d>d3 && d<=d4) return a4/100.0; else
return (1-a1/100.0-a2/100.0-a3/100.0-a4/100.0);
}

global void kernel (int A1, int A2, int A3, int A4, int dlinaPuti, int y, double FX) // функция ядра
{
int fotodiod[5]={10100, 30100, 50100, 70100, 90100};
int tid=blockIdx.x; // обработать данные, находящиеся по этому индексу
const double dlinavolny=0.6;
double itogovaja_dlina_puti=1400000.0;
if (tid>0 && tid<N)
FX[tid]=(atan(fotodiod[y]/(itogovaja_dlina_puti-dlinaPuti
100000))M_PI(tid*1.0)/dlinavolny)pow((tid1.0),2)*fD(A1,A2,A3,A4,tid);
}

int main()
{

double *massivA1=new double [316251]; // массив для хранения значений параметра А1
double *massivA2=new double [316251]; // массив для хранения значений параметра А2
double *massivA3=new double [316251]; // массив для хранения значений параметра А3
double *massivA4=new double [316251]; // массив для хранения значений параметра А4
double massivA5=new double [316251]; // массив для хранения значений параметра А5
double indikatrissa_teorija=new double [1581255];
int R=0;
double summFX=0.0;
double summa = 0.0;
double dev_FX=NULL;
double FX[N];
int schetchik=0; // индексация массивов
HANDLE_ERROR(cudaMalloc((void
)&dev_FX, N
sizeof(double))); // выделить память на GPU

			for (int A1=0; A1<=100; A1=A1+5)	
				{
			for (int A2=0; A2<=100-A1; A2=A2+5)  
				{
			for (int A3=0; A3<=100-A1-A2; A3=A3+5)    
				{
			for (int A4=0; A4<=100-A1-A2-A3; A4=A4+5)  
				{
					for (int y=0; y<=4; y=y+1) //
					{		
						for (int dlina_puti=0; dlina_puti<=10; dlina_puti=dlina_puti+1) // 
							{
						kernel<<<N, 1>>>(A1, A2, A3, A4, dlina_puti, y, dev_FX); // запускаем функцию ядра на GPU		
						HANDLE_ERROR(cudaMemcpy(FX, dev_FX, N*sizeof(double), cudaMemcpyDeviceToHost)); // копируем массив "FX" с GPU на CPU
							// суммирование значений, полученных на GPU
							for (int i=1; i<N; i=i+1) 
								{
								summa=summa+FX[i];
								}
							summFX=summFX+summa;
							summa=0.0; // обнуление суммы
							}							
					indikatrissa_teorija[R]=summFX*M_PI;
					R=R+1;
					summFX=0.0;		
					}	
				massivA1[schetchik]=A1/100.0; 
				massivA2[schetchik]=A2/100.0; 
				massivA3[schetchik]=A3/100.0; 
				massivA4[schetchik]=A4/100.0;
				schetchik=schetchik+1;	
				}
				}
				}
				}	

cudaFree(dev_FX); // освобождение памяти, выделенную на GPU

return 0;
}

When clicking on_BUTTON_1 ckick to calculate a code from “for (int A1=0; A1 <=100; A1=A1+5)” to “cudaFree(dev_FX);//memory release, selected for GPU” and to remove result in “Label1”
Our questions-answers - a lot of separate phrases, I was tangled. The sequence of actions for the decision of my task is necessary: point 1…, point 2…, point 3…, etc.

sorry, I can’t understand what you mean.

I believe you know

  • how to make Windows Form app using C++/CLI
  • how to call kernel-func.

anything else?

  • “how to make Windows Form app using C + + / CLI”

  • How to create Windows Forms I know: https://www.youtube.com/watch? v=AP8Tz9RfbxE

  • how to call kernel-func.

  • How to cause function of a kernel I know (console): literature - Sanders J., Kandrot E. - CUDA by Example. An Introduction to General-Purpose GPU Programming [2010, PDF, ENG] + Code. Page 46

There is similar literature, the instruction, video, an example how to connect CUDA to Windows Form + an output to the form???

There is an example how to use the cudaLaunchKernel function???

an example how to connect CUDA to Windows Form + an output to the form???

UI-interaction(UI -> host/host -> UI) is individual to CUDA.

I made a simple example. I sent a message to you, Anatoly.

TOTAL:
Create a CUDA-application with GUI in the Visual Studio of 2012 on С++:

  1. File->New->Project->Visual C+±>CLR->CLR Empty Project.
  2. Project->Add New Items->UI-> Windows Form.
  3. Project->Add New Items->NVIDIA CUDA 8.0->Coda-> CUDA C/C++ File.
  4. Your project->Build Customisazions…-> CUDA 8.0
  5. Project->Properties->Linker->Input->Additional Dependencies->“cudart.lib;”
  6. Project->Properties->Linker->System->Subsystem->Windows(/Subsystem:WINDOWS)
    7)Project->Properties->Linker->Advanced->Entry Point->“main”
  7. Files such (sample)

/------------------------------------------/
// MyForm.h
#pragma once
#include “cuda_runtime.h”


private: System::Void button1_Click(System::Object^ sender, System::EventArgs^ e)
{
const int a[5] = { 1, 2, 3, 4, 5 };
const int b[5] = { 10, 20, 30, 40, 50 };
int c[5] = { 0 };
extern void addKernel(int *c, const int *a, const int *b);
int *dev_a = 0;
int *dev_b = 0;
int *dev_c = 0;

				cudaMalloc((void**)&dev_c, 5 * sizeof(int));
				cudaMalloc((void**)&dev_a, 5 * sizeof(int));
				cudaMalloc((void**)&dev_b, 5 * sizeof(int));

				cudaMemcpy(dev_a, a, 5 * sizeof(int), cudaMemcpyHostToDevice);
				cudaMemcpy(dev_b, b, 5 * sizeof(int), cudaMemcpyHostToDevice);	

				void* args1[] = { &dev_c, &dev_a, &dev_b };
				cudaLaunchKernel<void>(&addKernel, dim3(1), dim3(5), args1);

				cudaDeviceSynchronize();
				cudaMemcpy(c, dev_c, 5 * sizeof(int), cudaMemcpyDeviceToHost);

				label12->Text=c[0].ToString();
				label13->Text=c[1].ToString();
				label14->Text=c[2].ToString();
				label15->Text=c[3].ToString();
				label16->Text=c[4].ToString();

				cudaFree(dev_c);
				cudaFree(dev_a);
				cudaFree(dev_b);
		 }

/-------------------------------------------------------/
// *.cu - file
#include “device_launch_parameters.h”

global void addKernel(int *c, const int a, const int b)
{
int i = threadIdx.x;
c[i] = a[i] + b[i];
}
/
-------------------------------------------------------
/
// MyForm.cpp

#include “MyForm.h”
#include <stdio.h>
struct CUstream_st{};
struct CUevent_st{};

using namespace std;
using namespace System;
using namespace System::Windows::Forms;
[STAThread]

void main()
{
Application::EnableVisualStyles();
Application::SetCompatibleTextRenderingDefault(false);
Project1::MyForm form;
Application::Run(%form);
}
/---------------------------------------------------------------------/
// Thanks for the help to the companion of episteme-AMBASSADOR!
// We enjoy operation!!!

my pleasure.

as cudaMalloc/cudaFree spend time, better to dev_a,b,c placed on member-variable, allocateed in prior to invoke kernel.

I have been trying to do something similar, but my problem is that I do not see what’s mentioned in Step 3 by Anatoly_88. There is no “Add New Items->NVIDIA CUDA 8.0->Coda-> CUDA C/C++ File.” option for me.

I tried adding a new file anyway with .cu extension, and the kernel defined there was called from my main form application using cudaLaunchKernel(). But now I receive a message(in a pop-up dialog box) saying “Debug Assertion Failed !!”. The expression mentions _CrtIsValidHeapPointer(block).

When I remove the kernel launch statement, I still have the error. But when I remove the extern statement, kernel launch statment and the .cu file everything works fine(by which I mean there is no error). This means CUDA is working, but the call to kernel or (since the problem is with heap) the moving the device pointer (to a,b or c) might be causing error in my case (although cudaMemcpy returns cudaSuccess).

Its the same steps that I am following and the similar (almost same) code.

So my problem is :

  1. How to get step 3(of Anatoly_88’s last comment) correct?
  2. What might be causing the _CrtIsValidHeapPointer(), and how do I solve this (this is the first time I have faced this error)?

Using https://devtalk.nvidia.com/default/topic/979462/how-can-i-compile-cuda-code-then-link-it-to-a-c-clr-project/
For punrt “TOTAL:”


  1. 2)…

Don’t forget later point No. 3 execute:
*.cu -> right mouse button-> Properties -> Element type = CUDA C/C++



At first establish CUDA Toolkit - https://developer.nvidia.com/cuda-downloads
and Select your platform below to download the toolkit.
After that execute everything on points

Hi
Thank you for the reply.

I have tried following what you have mentioned in the link. Independent CUDA projects are running fine, but as I go through your steps in the link, I do not see “Project->Add New Items->NVIDIA CUDA 8.0->Coda-> CUDA C/C++ File” in a empty CLR project. (But this option appears in a NVIDIA project which means that CUDA toolkit is running. I have also done other projects in CUDA that indicates CUDA is surely fine). What I am missing is some configuration steps maybe!!

In the .cu file that I added myself (as I mentioned in the last comment), the properties in fact shows “Element Type = CUDA C/C++”. So I don’t know what is causing the problem.

And regarding the error I am getting (_CrtIsValidHeapPointer), this has something to do with memory and local heap, but the code that I have in my main form is the one that you have posted in the link. So how do I solve this error ?

Would be glad to know if there is something more to it than following your steps. I am using Visual Studio Community 2015.

Exact same problem here. Any updates by chance?

Best