unspecified launch failure kernel fails sometimes not everytime

nabarunpaul · February 1, 2010, 3:23am

Hi All,

First of all after months of work I am able to run my iterations on GPU.
No Doubt the result is very impressive.
But when i go ahead and try to increase the number of iterations the kernel fails with statement “unspecified launch failure”.
and surprisingly sometime it launches kernel successfully for the same number of iterations.

I must say that my kernel program is very much bulky and it is also not conflicting with CUDA restrictions like registers etc.
I searched the forums here and i got no rigid answer.
Its again not a problem of XP watchdog as it fails in just few millisecond.
Please let me know if there is any means by which i can know what the exact reason why CUDA is behaving in such a unprofessional manner.

regards,
Nabarun.

nabarunpaul · February 1, 2010, 4:41am

One more information i wanted to pass -

Earlier i was working with global memory.

But my threads doesnot access the input data in an “one to one” manner which CUDA 1.0 capatibilty devices should.

As i read in the docs there is restrictions for global memory access for CUDA compatability minor version of 0.

I am having 1.0 so i later started to use constant memory instead of global meomory.

So now my observation is that it is failing less compared to the global meomry cases.

But it is still failing to launch now and then.

regards,

Nabarun

avidday · February 1, 2010, 7:44am

Probably out of bounds memory access somewhere. Unspecified launch errors are the equivalent of access violations or segfaults.

nabarunpaul · February 1, 2010, 10:21am

Well i am rechecking my entire code.

May be i ll need time to find if i am doing any segfaults etc!

But i wanted to update you that for same iteration and for same set of input data my kernel was running successfully “sometimes” with proper outputs.

Can there be any other reason for this type of launch failures?

avidday · February 1, 2010, 11:04am

Yes, another possibility might be bad hardware. But I would be verifying the code first. Try something like valgrind or GPU ocelot if you can. Ocelot, in particular, is fantastic for isolating improper memory use.

Having said that, hardare can cause what you are seeing. I had one particular 9500GT DDR3 card that worked perfectly until you pushed it past about 75% of peak memory bandwidth, in which case it started behaving very erratically, including random launch failures, driver errors, video ram corruption. Even in standard OpenGL benchmarks it would running happily for hours, but my CUDA code could make it start failing in minutes. Emulation with valgrind, Ocelot, cuda-gdb never helped find a bug with the code, and I was able to run it happily on other hardware. At the suggestion of someone here, I tried underclocking, and it helped a bit, but in the end put it down to bad hardware and gave up on it.

nabarunpaul · February 2, 2010, 4:37am

Yes, another possibility might be bad hardware. But I would be verifying the code first. Try something like valgrind or GPU ocelot if you can. Ocelot, in particular, is fantastic for isolating improper memory use.

Having said that, hardare can cause what you are seeing. I had one particular 9500GT DDR3 card that worked perfectly until you pushed it past about 75% of peak memory bandwidth, in which case it started behaving very erratically, including random launch failures, driver errors, video ram corruption. Even in standard OpenGL benchmarks it would running happily for hours, but my CUDA code could make it start failing in minutes. Emulation with valgrind, Ocelot, cuda-gdb never helped find a bug with the code, and I was able to run it happily on other hardware. At the suggestion of someone here, I tried underclocking, and it helped a bit, but in the end put it down to bad hardware and gave up on it.

I am digging inside the code since last 12 hrs.

I have observed few things -

I am not able to check if there is any segfault “Till now”.
the “same set of iterations” is running and failing now and then.
I am getting “correct result from each threads” when the kernel launches successfully.
I am able to go beyond limit when i try to access data linearly from constant memory but is often fails when i try to access the data in a haphazard way from the constant memory.

avidday, I am still to use the tools u mentioned. I am trying to use it.

I ll update here once i confirm.

Anyway as i mentioned for same set sometimes it launches successfully with proper output that i might would have got if i would have run it in CPU.

But if i try to access the data linearly it is working fine :) and i can go beyond limit.

But the problem is when i try to access the input from here and there.

As i read in the docs CUDA compitability 1.0 has restrictions on this. Global memory cant be accessed is such a fashion.

So i switched to constant memory. Anyway Can this be a reason?

Bt avidday I want to thank you for all your help.

regards,

Nabarun.

nabarunpaul · February 2, 2010, 6:41am

I mean to say, can it be a reason for “unspecified launch failure” if i disobey the rule mentioned in the attachment.

Please find the attachment.

Aviday,
Please let me know if you are not clear with my English.
GPU.bmp (1.49 MB)

Topic		Replies	Views
unspecified launch failure kernel fails if a loop is too long CUDA Programming and Performance	8	42843	April 25, 2007
Other causes of Unspecified Launch Failues CUDA Programming and Performance	2	2543	May 15, 2010
Unspecifiec launch failure on CUDA_SAFE_CALL(cudaThreadSynchronize()) CUDA Programming and Performance	5	2118	January 27, 2011
unspecified launch failure What additional info about this error available? CUDA Programming and Performance	3	1178	November 19, 2009
Random Launch Failure CUDA Programming and Performance	2	1241	March 1, 2010
Asynchronous calls to GPU in a 'for' loop - unspecified launch failure CUDA Programming and Performance	6	2038	August 10, 2010
Launch failures after CUDA upgrade? 2.0 -> 2.3 = unspecified launch failures CUDA Programming and Performance	6	4406	August 20, 2009
Clearing Cuda Errors CUDA Programming and Performance	6	11385	December 1, 2009
Unspecified launch failure strange error, please help CUDA Programming and Performance	13	16279	December 31, 2007
Kernel crashed due to "unspecified launch failure" on CUDA 9 but not CUDA 8 CUDA Programming and Performance	3	749	October 31, 2017

unspecified launch failure kernel fails sometimes not everytime

Related topics