Disabling Runtime Execution Limit

little.dude64.ld · February 20, 2021, 11:29pm

Hey there,
I’ve been attempting to get around the run time limit for you computations on my Jetson Nano 2gb.

I’ve tried from setting default to multi-target to boot into textual mode, to setting"Interactive" “0” on the corgi.conf file.

I did find that there’s two files that start with xorg in the /etc/X11 folder. There’s xorg.conf and corgi.conf.jetson. both have a section about device with identical attributes and modifying them both with interactive didn’t work either. I’m not sure what system to disable to get my kernels to run until termination. Any help I can get will be greatly appreciated

kayccc · March 4, 2021, 12:53am

Sorry for the late response, is this still an issue to support?

Thanks

little.dude64.ld · March 4, 2021, 5:28am

Yes, I’ve still been unable to allow my kernels to execute without a runtime limit.

The weird thing to me is that I’ve tried set-default multi-target to boot into textual mode and from here doing sudo init 3 which by my account should ultimately disable any limit but on devquery i still get reported that it’s on and my kernels terminate at 4 or 5 seconds

kayccc · March 5, 2021, 11:50am

Would you please help to provide more information on what exactly below statement mean:
“I’m not sure what system to disable to get my kernels to run until termination”

Which kernels are you trying to run till termination?

little.dude64.ld · March 5, 2021, 3:39pm

I’m coding in Cuda and I’m testing things out.
I made a really simple kernel that adds up to a certain value and then terminates but there’s a runtime limit on the gpu which terminates the kernel preemptively causing errors.

When I run devquery it’s one of the listed properties and I’ve been attempting to shut it off. It’s proper property name is:

Run time limit on kernels: Yes

When I look for the properties with the runtime library it’s under the method

cudaDevAttrKernelExecTimeout: 1

As i understand it i should be able to have this disabled and have my kernels run without termination after 4 to 5 sec.

I’ve tried to

Sysctl kernel.watchdog=0

Nmi.watchdog=0
Soft_watchdog=0

Running on init 3

I’ve tried many things that seem to work for others running linux in different machines. I feel jetson nano should be capable

kayccc · March 17, 2021, 2:55am

Pease do “echo N > /sys/kernel/debug/gpu.0/timeouts_enabled” to see if can help.

This is a generic knob not specific to KMD watchdog though. This will disable few other timeouts (like semaphore acquire timeout etc) too.

little.dude64.ld · March 17, 2021, 4:25am

Beautiful, this worked great for my purposes. I truly appreciate your help

usher · April 25, 2021, 7:17pm

Hi,
I had the same issue and was able to solve as above by changing the value of “timeouts_enabled”. This is fine until I reboot, then the value gets reset. Is there a way to permanently reset this value?
Thanks!

little.dude64.ld · April 25, 2021, 8:11pm

I personally think that having it be a value you modify internationally every time is a better safer alternatively

I don’t know if any methods for setting it permanently but I’ve made a script i run as a custom command on my system. It’s just a simple command that i think you could figure out how to put into the .bashrc file and run at boot. Not permanently setting the value on the system but having the same effect of not having to manually set it yourself

sudo bash -c “echo N > /sys/kernel/debug/gpu.0/timeouts_enabled”

This does ask you for the password so maybe running it on the bashrc file for root?

No real clue but hope this could be a lead for you, best of luck

usher · April 25, 2021, 8:23pm

Thanks!
I’m trying to run a small cluster of Jetson Nanos (4GB) which are on a local network with a Jetson AGX as the primary node. So I was hoping for something that didn’t need to touch each Nano each time I turn everything on…

little.dude64.ld · April 25, 2021, 8:27pm

Oh, i see. I’m fairly getting started so I’m kind of stuck right here. Luck with your cluster.

little.dude64.ld · May 6, 2021, 8:45am

I know you were probably looking for something else but I’ve been recently learning about cron and think it has the chance to work for you.

Run the script like I showed with the @reboot macro and root permission

Maybe it could work?