linux: 5 second runtime for CUDA on display GPU?

Hi, the CUDA NVIDIA Linux Release Notes (again, here: http://developer.download.nvidia.com/compu…_1.1_linux.txt) indicate the following:

-----8<-------------------------------------------------------------------------
o Individual GPU program launches are limited to a run time
of less than 5 seconds on a GPU with a display attached.
Exceeding this time limit causes a launch failure reported
through the CUDA driver or the CUDA runtime. GPUs without
a display attached are not subject to the 5 second run time
restriction. For this reason it is recommeded that CUDA is
run on a GPU that is NOT attached to an X display.
-----8<-------------------------------------------------------------------------

I don’t really understand this. The SDK includes many examples of GPU code being used in concert with OpenGL and GLUT code which is surely running on the same graphics card. How do OpenGL and CUDA work together if they are not on the same GPU / graphics card? This is one of the key ways I want to use CUDA: to process data on the card and immediately visualise it.

Should this say “function call / API call” rather than “program” in the first sentence???

thanks,

David Barnes.

This 5 seconds is the limit for a kernel. Your program may take 10 years, as long as the individual kernels cost less than 5 seconds.

Is this related to X or not? If I run a console display,does the 5 second limit still apply?

– Kuisma

The 5 second limit does not apply when you run a text console without X.

hi, i am running a very simple cuda program (pasted below) for calculating FFT using the CUDA FFT library functions. It not only freezes my computer (for large data size not a power of two which causes the program to run slowly, such as 6324137) when my X Display is still on, but also freezes (not always though) even when i’m running only the text console. I’m using Fedora Core 8 with FX Quadro 1700, got into text console with the command init 3. So i’m not sure whether it is caused by the known 5 secs bug, or is it a problem with the library functions, or anything else.

Thoughs anyone?

simple fft program:

// includes, system

include <stdlib.h>

include <stdio.h>

include <string.h>

include <math.h>

// includes, project

include <cufft.h>

// Defining how many data to be used

define NUM 2048*2048

cufftComplex *runfft(cufftComplex *data, int num);

// Program use to calculate the fft for a simple array.

int main( int argc, char **argv)

{

int i;

int num;

// Can take in command line arguments for specifying a non-default data size

if( argc > 1 )

{

num = atoi(argv[1]);

}

else num = NUM;

// Initialising the input data

cufftComplex *data = (cufftComplex *)malloc(sizeof(cufftComplex) * num);

for( i = 0; i < num; i++ )

{

data[i].x = (float)i+1;

data[i].y = 0.0f;

}

// Print the input data, comment out when the data set is large

printf( "%d data points\n\nInput data\n", num );

// for( i = 0; i < num; i++ )

// {

// printf( “%d – %f + %fi\n”, i+1, data[i].x, data[i].y );

// }

printf( “\n\nAfter fft\n\n” );

data = runfft(data, num);

// Print the result fft

// for( i = 0; i < num; i++ )

// {

// printf( “%d – %f + %fi\n”, i+1, data[i].x, data[i].y );

// }

free(data);

}

cufftComplex *runfft(cufftComplex *data, int num)

{

// Allocate device memory for data

cufftComplex *d_data;

cudaMalloc( (void **)&d_data, sizeof(cufftComplex) * num );

// Copy host memory to device

cudaMemcpy(d_data, data, num * sizeof(cufftComplex), cudaMemcpyHostToDevice);

// CUFFT plan

cufftHandle plan;

cufftPlan1d(&plan, num, CUFFT_C2C, 1);

// FFT execution

cufftExecC2C(plan, (cufftComplex *)d_data, (cufftComplex *)d_data, CUFFT_FORWARD);

// Copy result to host

cudaMemcpy(data, d_data, num * sizeof(cufftComplex), cudaMemcpyDeviceToHost);

// Clear device memory

cufftDestroy(plan);

cudaFree(d_data);

return data;

}

using the CUDA_SAFE_CALL macros from cutil.h might help in tracking this down. You will at least get an error printed when something bad has happened. For me I had a problem where I got a freeze, but the actual error happened much earlier in my program, so adding the macros around cuda calls and adding some CUT_CHECK_ERROR’s in my program showed me the problem area.

I am having a similar problem with erratic (sometimes crashes X requiring hard reboot, sometimes it makes it through) results running my CUDA app. I am running on a HP dv6000 laptop with the 8400M GT processor. The reason I am adding to this thread is that I noticed that the times that it doesn’t crash X, the run time is below 5seconds
I am going to go home and try the same app on my workstation (8600 + 2x Tesla), to see if using a non-display attached device helps.
Everything is currently wrapped as a safe call. Does anyone know if there are special consideration when doing cuda memory allocation/etc when using a device that shares memory with the host?

Well things got a little more stable when I took out the CUT_DEVICE_INIT() reference.
That stopped the program from killing X during runtime though after running a couple of GPU processes, the display (for lack of a better phrase) got sick and died. It started with some color bars at the top of the screen followed by the display server slowly but surely dying over the course of about 2 minutes. This happened while I was editing code and hadn’t run any CUDA apps for at least a good 10 minutes.

Any thoughts? Could this be related to the fact that I am running the Fedora 7 x86_64 toolkit under Fedora 8 x86_64? >.<