Monitoring kernel calls in emulation mode

Jameshobbs · February 21, 2009, 9:23am

Hello,
So… I have integrated cuda graphics cards into my school’s high throughput computing environment, which is managed by Condor (http://www.cs.wisc.edu/condor/).

I have encountered potential problems with malicious users that might decide to submit jobs into the environment that have an infinite loop in their cuda kernel :ph34r: . Essentially I want to detect these infinite loops before they run on the gpu. Because of the submission style of Condor (not requiring X), I have not witnessed the 5s rule.

One idea I’ve come up with is to emulate a cuda user’s code on the cpu and essentially set a time limit for cuda kernel calls. If a cuda kernel runs for too long, then that job is killed and an error message is sent to the user that submitted the job. The only problem I have is how can one monitor kernel calls :blink: .

Any other ideas on how to kill a job that has an infinite loop in their cuda kernel is much appreciated.