Hi everyone, I am training a owl_Vit model. While training I am getting this error. It was working before but has suddenly stopped working. Now whenever I start training, after 3 to 4 mins, it shows this error.
The usual reason for that error is that CUPTI allows only one user (nominally, a process) at a time. If the training you are launching is using multiple processes, it is possible to witness this. I won’t be able to tell you how to sort out whatever you are doing to avoid this. CUPTI is a profiling tool, so whatever training framework you are using is evidently offering some additional instrumentation beyond what is needed for training itself. There might be a way to go into the framework and disable this instrumentation/profiling/cupti usage, to avoid the issue.
You can find other reports like this that may come about because either you are doing something incorrectly (e.g. usage of a profiler yourself) or else the framework had a defect introduced.