Error executing two threads using OpenGL

jmclaveria · November 18, 2008, 10:22am

Hi, first of all, sorry for my poor English.

I keep developing in Cuda, I’m programming a few examples using Cuda and OpenGL, the results are great, but now I have a problem and I can’t solve it after a lot ot tries.

I try now using two devices for computing. They’re two Geforce 9600GT on a 780 SLI motherboard. I’ve disabled the SLI mode, so CUDA can see the two devices.

I’m following the ‘simpleMultiGPU’ example in CUDA SDK. This example compiles and run well (although CPU [Quad 9300] compute time is lower than GPU compute time, two times faster!) and CUDA detects two devices in my PC, the two threads are launched correctly.

Let me tell you my problem:

My example is very simple. I have a plane mesh, defined with vertices and indices, for each frame the position of all vertices increase -0.1 units in Y axis. I’m using VBO’s (vertex buffer object) and IBO’s (index buffer object).

Executing my example without threads (I refer application threads, not CUDA threads), I have no problem. The plane moves quick and smooth towards Y axis. What I do is:

[indent]1. Create VBO and IBO
2. Register VBO and IBO in CUDA
3. For each frame:
[indent]3.1. Map VBO in device memory
3.2. Modify vertex in device memory
3.3. Unmap VBO
3.4. Draw plane using OpenGL[/indent][/indent]

As I told you before, it works perfectly. But when I use multi threading, the application crashes:

[indent]1. Create VBO and IBO
2. Register VBO and IBO in CUDA
3. For each frame:
[indent]3.1. Create two threads.
3.2. Set the device in CUDA for each thread.
3.3. Map VBO in device memory
3.4. Modify vertex in device memory
3.5. Unmap VBO
3.6. Draw plane using OpenGL[/indent][/indent]

I’m using the same code to create the threads in ‘simpleMultiGpu’:

threads [0] = cutStartThread((CUT_THREADROUTINE)dispatcher, (void *)(&data1));
threads [1] = cutStartThread((CUT_THREADROUTINE)dispatcher, (void *)(&data2));

‘dispatcher’ function sets the device and map VBO, then executes kernel and unmap VBO:

[b][i]// Set the device
cudaSetDevice (data->device);

// Map VBO
float3 d_vboPlane;
[u]CUDA_SAFE_CALL(cudaGLMapBufferObject((void*)&d_vboPlane, data->planeId)); // CRASHES[/u]

// Creates dimensions
dim3 blk (nBlocks, 1, 1);
dim3 thrd (nThreads, 1, 1);

// Call kernel
sampleMultiThread_kernel <<<blk, thrd>>>(d_vboPlane, data->planeSize);

// Synchronize threads
cudaThreadSynchronize ();

// Unmap object
CUDA_SAFE_CALL(cudaGLUnmapBufferObject(data->sphereId));
[/i][/b]

When application reaches 'cudaGLMapBufferObject, application crashes. The message in output is:

First-chance exception at 0x77d4dd10 in testOpenGLCubo.exe: Microsoft C++ exception: cudaError_enum at memory location 0x051efdb8…

If I execute only one thread, the application crashes the same way.

I’ve searched for this error in the forum, and the partial solutions discussed here havn’t solved my problem :(. Please, could you help me?

Thanks in advance.

jmclaveria · November 19, 2008, 7:11am

Anybody, please?

Simon_Green · November 19, 2008, 11:51am

It’s don’t understand exactly what you’re trying to do. You have a single OpenGL context? You have to make sure all GL calls are from the same thread.

Anyway, there is no real advantage to using OpenGL interop across multiple GPUs (it just does copies internally), if I were you I would simply read back the geometry to the CPU and render from there.

jmclaveria · November 20, 2008, 8:34am

Sorry, I have a very, very poor English level, I’ll try to explain better my problem.

Soon (in a few months) we have to program an application that makes a lot of calculations on various meshes. Basically, we have to modify the meshes vertices position in each frame using an advanced and weighted mathematical algorithm. This algorithm is so heavy that we get only 0,5~1 frames per second using a Quad CPU.

Our goal is programming this algorithm in CUDA, using parallel computation to get better results, at least real-time (25 fps). Vertices calculations are independant, you can apply the algorithm on one of them and you don’t have to know the values of the rest of vertices.

We want to take advantage of our two GForge 9600 cards, so we have to use multi threading. Basically, our intention is mapping the mesh in CUDA and then each device calculates the half of mesh vertices:

VBO Mesh → Map VBO in CUDA (device memory) → Create Threads

– Thread 1 (in device 1) calculates half vertices

– Thread 2 (in device 2) calculates half vertices

Could I take any advantage for my problem in CUDA?

Sorry again for my English, and thanks, thanks a lot for your time.

Topic		Replies	Views
Has anyone used 2 cards for CUDA rendering? CUDA Programming and Performance	6	3321	May 18, 2010
cudaGraphicsMapResources called on background thread returns cudaErrorInvalidGraphicsContext CUDA Programming and Performance cuda	1	805	January 10, 2023
Multiple OpenGL windows/contexts CUDA Programming and Performance	19	19361	August 13, 2008
GPU Affinity Performance One Man's Battle to get Two Operating Systems to run Three Cards CUDA Programming and Performance	0	2693	February 5, 2010
problem with multi GPU application CUDA Programming and Performance	2	4289	March 4, 2009
Different threads in runtime api CUDA Programming and Performance	8	6720	September 4, 2008
OpenGL Error CUDA Programming and Performance	6	5663	November 15, 2011
CPU threads and CUDA CUDA Programming and Performance	8	7246	January 15, 2018
MultiGPU start help CUDA Programming and Performance	8	10523	August 10, 2010
Multiple GPUs Devise a synchro mechanism for host threads CUDA Programming and Performance	7	4206	May 13, 2010

Error executing two threads using OpenGL

Related topics