CUDA and Labview

Hello everyone, CUDA noob here. I am trying to implement CUDA into a labview program using the DLL Node within labview. The basic layout of what I am trying to do is this:

  1. Acquire data
  2. Do some minor Calculations on the CPU
  3. Allocate Memory on the Video card
  4. Copy data to the GPU
  5. Signal Process on the Video Card
  6. Copy data off the GPU
  7. Display data

The problem I am currently having is during the cudaMalloc step. I am calling cudaMalloc as described in the literature and examples, but it is crashing my labview program. I know that the cudaMalloc is crashing the labview program because I have commented out everything downstream of cudaMalloc, and the DLL crashes labview. When I comment out everything downstream of cudaMalloc AND cudaMalloc, the DLL does not crash labview.

I am able to run all of the pre-compiled .exe files that I have downloaded so far with no errors that I could detect, so I am pretty sure that CUDA is working.

I guess my ultimate question is, “Is there a way to run a CUDA DLL from Labview? Has anyone succesfully done so?” I can give more detailed information if needed.

Thanks,
Austin

Edit: Sorry, but I forgot to mention the amount of memory I am allocating vs. on the card. The card I am using is an 8800 GTS with 320MB of DDR3 memory. I am trying to allocate around 8 - 9 MB of data on the GPU for processing.

Which version of CUDA are you using?

I am using the .8 version from the website. I have downloaded the 97.73 video drivers as well. I should also note that my compiler is Visual Studio 2005 Professional and that I am not receiving any compiler/linker errors.

Thanks,

Austin

Austin,
it is a known bug of 0.8, fixed in 0.9.
If you are not a registered developer, you will need to wait a little bit.

Massimiliano

Is the bug that CUDA can’t run with Labview or that there is a general cudaMalloc error? When is CUDA .9 supposed to be out?

On a side note, do I need to be using NVCC to compile? I am just using VS2005. Could this be a cause of problems? The only function I am calling at this point is cudaMalloc.

Thanks,

Austin

It was a bug related to DLL.
If you are just doing cudaMalloc and cuBLAS or CUFFT calls, VS2005 will work.
If you need to compile a real cuda file (.cu), you will need nvcc.

The 0.9 release is going to be out soon.

I was able to join the NVIDIA Developers group and get access to the .9 CUDA suite. I installed the .9 CUDA suite on my development and test boxes (they are 2 separate computers). I also installed the drivers for the NVIDIA card on the test box which I downloaded from the developers website. I cannot install the drivers on my development PC since it is a Lenovo T60, which has a lame ATI card. After compiling the code from the first post in the thread with the new libs/include headers, the same error happens. When the code gets the the cudaMalloc line, the DLL crashes. Again, I know it is the cudaMalloc for the same reason stated in the first post.

My questions are this:

1.) Do I need to have the NVIDIA drivers installed on my development PC?

2.) How can I get cudaMalloc to be called from an external DLL?

3.) I have not been able to access the developers forums. Are they down?

Thanks,

Austin

I think I fixed one of the problems here. After starting from scratch with both the CUDA program and the Labview program, it quickly became apparent that Labview wasn’t finding the CUDA.DLL files (cudart.dll, cuda.dll) etc. After copying these DLLs into the same folder as the VI and the DLL I wrote, the program runs. However there is another issue that I will be posting in a separate thread.

I checked my environment variables, and CUDA was added, but there is an apparent disconnect between labview and windows. The other solution would be to place the CUDA dlls in the c:\windows\system32 folder, since I know labview checks that folder at startup.

Austin-

Just wondering how did this turn out for you? We recently purchased one of the Tesla cards and are trying to operate using LabVIEW 8.5. I’ve read through the manual but i am still confused how I access everything through the LabVIEW interface. Do you have any suggestions that could get me started?

Thanks,

Lucas Yeary

Lucas, et al.

I’ve ported a G-version of a Black-Scholes PDE from LabVIEW to an external C/CUDA DLL back in Oct '07. I used a mixture of VS2005 and NVCC to generate the binaries with v1.0 of the CUDA SDK. To accelerate the porting process, I used the high-level API for the CUDA run-time.

I ran across very few issues in getting the DLL to work. My development path included several steps to insure I didn’t introduce errors during the porting process:

  1. Created C++ template version of the code to test against LabVIEW’s double-precision version. [sanity and accuracy]
  2. Converted the template to a strict ANSI C version then tested that code against the single-precision version of the C++ template. [again for sanity & accuracy]
  3. Updated the C version to include CUDA kernel calls. [once more tested results against the C & C++ versions]

I have to say it went much smoother than I thought and the development was considerably shorter than expected. I’m actually writing an article that covers this for CUDAZone. You may want to keep an eye out for it over the next month or so.

I’ve run this algorithm succesfully on three different NVIDIA boards: 8600GT, 8800GTS (512M), and D870 (dual Tesla). The solution passes LabVIEW arrays to and from the DLL, allocates buffers on the CUDA device, and (inside the DLL) transfers 1D and 2D data between host & device.

If you have specific questions regarding LV & CUDA, just post them to this thread and I should be able to help out.

Darren Schmidt
NI LabVIEW Math & Signal Processing Group

Hey all,

This is a question mostly for Darren Schmidt. Have you managed to write an article or a procedure for integrating CUDA stuff in LabView? I’ll probably start digging into this issue soon, so I was wondering if there were some updates on the subject. I’ve had a look at the CUDA zone, and didn’t find anything relevant on the subject.

Thanks very much,

Audrey

I will test the FFT Library in the next time. Can you upload some little labview example code for me.

I’m replying to this thread just to help people jump-start using CUDA with Labview in Windows.

update: NI announced their CUDA package:
[url=“Welcome to NI Labs LabVIEW GPU Computing - NI Community”]http://decibel.ni.com/content/thread/3524[/url]
but at first sight it looks not so easy to use. Anyway, if somebody is interested how to implement CUDA from scratch, feel free to read.

This is not meant to be exhaustive introduction and some familiarity with CUDA is expected (please refer to the CUDA documentation, first couple of chapters in Programming Guide would be enough for start):
[url=“http://www.nvidia.com/object/cuda_develop.html”]http://www.nvidia.com/object/cuda_develop.html[/url]

So far I’ve been successfully using CUDA 2.3 (in Visual Studio 2008) and Labview 8.6 for some real time image processing and it has been working really well for me. My algorithm needs to perform around 10 filtering operations (FFT, matrix point multiplication, IFFT) and it does that in less then 50ms for images 1024x1024 (in comparison to over 300ms on CPU). This was enough for our real time application.

  1. So, first thing on your list is to install CUDA drivers, toolkit and sdk from nvidia website (you need all of them).

  2. Second thing is to make dlls with CUDA. In short, you would like to start with CUDA template file that comes in the “…SDK\C\src” folder. You need to modify the code and project properties (under project tab) so the compiler knows it needs to make dlls. Don’t forget to put __declspec(dllexport) void in front of every function you want to be contained in your dll.

Extensive information how to make dll can be found at different treads like this one:
[url=“http://forums.nvidia.com/index.php?showtopic=97928&pid=545650&mode=threaded&start=0#entry545650”]The Official NVIDIA Forums | NVIDIA

  1. You need to import that dll in the labview. You do this by using Call Library Function Node (under Connectivity/Libraries and Executables).
    In short, you need to specify the path to your dll (by default it will be “…SDK\C\bin\win32\Debug” folder), and add the parameters that Labview is going to use. Parameters must match exactly with the parameters you have in your dll source code, otherwise Labview crashes).

Detailed explanation how Call Library Function Node works can be found here (its way too much then you need if you are beginner):
[url=“Product Documentation - NI”]Product Documentation - NI

  1. Finally, if you want to share your dll to different computers, don’t forget to compile Release version of the dll.

That’s it. I’m attaching simple code that scales array. The VI creates an 1D array and a scaling constant, passes those to a dll that performs scaling on the GPU (note: scaling is done in-place).

Note: For some reason, your Labview VI needs to be closed when dll is being compiled. This is a slight nuisance when you need to go back and forth between Labview and VS because you have to close the VI every time.

I’m more then open for suggestions, comments, etc. that would improve this brief description of how to use CUDA with Labview.

Hope it helps

Nenad
BU Biomicroscopy Lab

P.S. For some reason it turns out I can not attach VIs so I’m posting only snapshot.


// includes, project
#include <cutil_inline.h>

// Labview will pass array ‘h_a’ (‘h’ stands for host), scalar ‘alpha’ and array size.

#define BLOCKSIZE 512 // 512 is the maximum number of threads in the block.

global void ScaleMatrix_Kernel( float *d_a, float alpha, int arraySize)
{
// Block index
int bx = blockIdx.x;
// Thread index
int tx = threadIdx.x;

int begin = blockDim.x * bx;
int index = begin + tx;

// copies array into shared memory, important only if threads are communicating between each other. Its not necessary here since we are only scaling vector.

__shared__ float d_as[BLOCKSIZE];  
d_as[tx] = d_a[index];

__syncthreads();

// copies array back to global device memory
d_a[index] =  alpha * d_as[tx];

}

__declspec(dllexport) void ScaleMatrix(float *h_a, float alpha, int arraySize)
{
unsigned int mem_size = sizeof( float) * arraySize;

// allocate device memory
float* d_a;
cutilSafeCall( cudaMalloc( (void**) &d_a, mem_size));
// copy host memory to global device memory
cutilSafeCall( cudaMemcpy( d_a, h_a, mem_size, cudaMemcpyHostToDevice) );

// setup execution parameters
dim3  dimGrid( 1, 1, 1);
dim3  dimBlock( BLOCKSIZE, 1, 1);  // assumes arraySize is the multiples of BLOCKSIZE! or less then a BLOCKSIZE 

// execute the kernel
ScaleMatrix_Kernel<<< dimGrid, dimBlock>>>( d_a, alpha, arraySize);

// copy device memory to host
cutilSafeCall( cudaMemcpy( h_a, d_a, mem_size, cudaMemcpyDeviceToHost) );

cutilSafeCall(cudaFree(d_a));

}

------------------ end ---------------------------------

great!
thanks very much!

I have problems with the memcopy function in Labview. The memcopy function works very nice but to slow. Can you post the link for your Labview vi.

Not to get your hopes up, but if all goes well, I might (that’s the key word) have something very interesting for you Labview folks to play with in a few months. I’ll post here (or perhaps in a new thread) when I have more info for you.

[attachment=13430:vi.jpg]

I didn’t have time to post it on labview forum yet but here is the snapshot. That would be enough.