I used --cubin option and printed the register number for each thread, I then count the defined register variables in my kernel, they roughly match.
however, in my code, I have many places where the RHS requires a bunch of floating-point operations, such as
ctheta=tmp0*cphi*cphi+tmp1*len;
I imagine these operations also require temporary registers to hold the intermediate results when evaluating the RHS. My question is: is the register count reported in the cubin file include these temp registers? or, in other words, will these temp registers consume the 8192 register limit?
the reason I ask this is because I am getting “the launch timed out and was terminated” error when the thread number is set to a bigger value. From what I searched online, I think this is related to register limit. Anyone want to share your experience on this?
“the launch timed out and was terminated” error occurs when your program run too slow,
as far as i know
OS: window xp, maximum for your program runs without timeout occurs is 5sec.
OS: window vista: maximum for your program runs without timeout occurs is 3sec.
OS: linux no limited.
if you want to get to know how many register your program has used, you have many kind to do it.
using cuda visual profiler
using “–keep” flag in compiler command, you will get the *.cubin file when build your program, inside this *.cubin file you can see the number of your program has used.
using “–ptxas-options=-v” flag in compiler command, you will see the number of registers your program has used when compile this *.cu file.
I ran my program on Linux (CentOS 5.3) and received timed-out error when my kernel runs more than 10second or so.
Is this limitation imposed by operating system or by nvidia drivers/CUDA? My application is scientific computing and more than 10 sec is very common. Is there a way to get around this limitation?
thanks for the tips. unfortunately, I tried method 2 and 3, whenever I used atomicExch in the code and -arch compute_11 option, --keep will not produce cubin, neither does --ptxas-options=-v report register number.
timed-out error
I really don’t know why you get time-out error when running on Linux. In my experiments, my program runned too slow (more than 30 minutes) but time-out error didn’t occur.
by the way, did you use your nvidia gpu card for two purposes? (display and computing).
registers.
I have used 3 methods above and all of them work perfectly (on window XP), before I have never get the number of register when my programs executed on Linux so I hope someone kindly to give you some suggestions.
the recent deviceQuery samples in SDK 2.2.x include some code to query if the watchdog timer is active.
Are you accessing arrays in an extremely uncoalesced way? I don’t quite see why the operations you posted should take so long. Doublecheck that you’re not accidentally running this in an <<<1,1>>> launch configuration :)
In my experience, the register count reported by the nvcc flag “–ptxas-options=-v” is very accurate.