Hi, first of all, sorry for my poor English.
I keep developing in Cuda, I’m programming a few examples using Cuda and OpenGL, the results are great, but now I have a problem and I can’t solve it after a lot ot tries.
I try now using two devices for computing. They’re two Geforce 9600GT on a 780 SLI motherboard. I’ve disabled the SLI mode, so CUDA can see the two devices.
I’m following the ‘simpleMultiGPU’ example in CUDA SDK. This example compiles and run well (although CPU [Quad 9300] compute time is lower than GPU compute time, two times faster!) and CUDA detects two devices in my PC, the two threads are launched correctly.
Let me tell you my problem:
My example is very simple. I have a plane mesh, defined with vertices and indices, for each frame the position of all vertices increase -0.1 units in Y axis. I’m using VBO’s (vertex buffer object) and IBO’s (index buffer object).
Executing my example without threads (I refer application threads, not CUDA threads), I have no problem. The plane moves quick and smooth towards Y axis. What I do is:
[indent]1. Create VBO and IBO
2. Register VBO and IBO in CUDA
3. For each frame:
[indent]3.1. Map VBO in device memory
3.2. Modify vertex in device memory
3.3. Unmap VBO
3.4. Draw plane using OpenGL[/indent][/indent]
As I told you before, it works perfectly. But when I use multi threading, the application crashes:
[indent]1. Create VBO and IBO
2. Register VBO and IBO in CUDA
3. For each frame:
[indent]3.1. Create two threads.
3.2. Set the device in CUDA for each thread.
3.3. Map VBO in device memory
3.4. Modify vertex in device memory
3.5. Unmap VBO
3.6. Draw plane using OpenGL[/indent][/indent]
I’m using the same code to create the threads in ‘simpleMultiGpu’:
threads [0] = cutStartThread((CUT_THREADROUTINE)dispatcher, (void *)(&data1));
threads [1] = cutStartThread((CUT_THREADROUTINE)dispatcher, (void *)(&data2));
‘dispatcher’ function sets the device and map VBO, then executes kernel and unmap VBO:
[b][i]// Set the device
cudaSetDevice (data->device);
// Map VBO
float3 d_vboPlane;
[u]CUDA_SAFE_CALL(cudaGLMapBufferObject((void*)&d_vboPlane, data->planeId)); // CRASHES[/u]
// Creates dimensions
dim3 blk (nBlocks, 1, 1);
dim3 thrd (nThreads, 1, 1);
// Call kernel
sampleMultiThread_kernel <<<blk, thrd>>>(d_vboPlane, data->planeSize);
// Synchronize threads
cudaThreadSynchronize ();
// Unmap object
CUDA_SAFE_CALL(cudaGLUnmapBufferObject(data->sphereId));
[/i][/b]
When application reaches 'cudaGLMapBufferObject, application crashes. The message in output is:
First-chance exception at 0x77d4dd10 in testOpenGLCubo.exe: Microsoft C++ exception: cudaError_enum at memory location 0x051efdb8…
If I execute only one thread, the application crashes the same way.
I’ve searched for this error in the forum, and the partial solutions discussed here havn’t solved my problem :(. Please, could you help me?
Thanks in advance.