Unable to cancel clara job

Hi, experts.

I wanted to stop unfinished clara job. I executed following commands. but it is ignored.
Is there anything I need to do before canceling the job?

And I’m waiting for the reply to this question(Called AE Title Not Recognized).

Thanks.

root@v100-02:~# clara cancel jobs --all
Successfully canceled 0 job(s).

root@v100-02:~# clara list jobs
NAME JOB_ID PAYLOAD_ID START_TIME END_TIME
covidaet-covid-00014 67cae3a708dc4a40840f8a298e9d2b29 bc6a73eadd0648a5ad1a4887b6d99ae0 2020-11-18 17:54:34 +0900 KST -

root@v100-02:~# clara cancel jobs -j 67cae3a708dc4a40840f8a298e9d2b29
Code: -8456, Ignored request to cancel job: {67cae3a708dc4a40840f8a298e9d2b29}. Unable to complete Jobs::Cancel, cancellation of job {67cae3a708dc4a40840f8a298e9d2b29} has been rejected by scheduler.


Currently, the publicly available version of Clara does not support canceling running jobs, only pending ones. This utility will be available in the 0.7.4.

There is a known issue (also fixed in 0.7.4) where failed jobs persist in the running state. To remove the pod and associated job you will have to use the following:

  1. List the running pod

    kubectl get pods | grep covid

this will provide the name of the COVID-19 pod. You can delete the pod using:

kubectl delete pod <running-pod-name>
  1. Delete the pipeline-job; first get the pipeline job

kubectl get pipelinejobs | grep 67cae3

to get the failing job id (this is necessary because out UUID format does not contain hyphens), then delete it using

kubectl delete pipelinejobs <job-id>

This should remove both the pod and the job. These actions are managing Clara resources outside the expected controls, so Platform may need to be recycled (clara platform restart && clara dicom restart).

Thank you for response! I’ll try it.

and I have another question.

when I try to pull image from the ngc, I am getting the below error.

"kubectl describe pods" showing message - ImagePullbackOff

Does not dicom-seg-writer image exist? or Is the command wrong?

Hi, it seems that image isn’t yet publicly available yet, but will be in v0.7.3.
See https://ngc.nvidia.com/catalog/containers?orderBy=scoreDESC&pageNumber=0&query=dicom&quickFilter=containers&filters=

If you are looking to pull from a private container (such for early access) use the private org/team configuration you have been provided, but make sure to run docker login nvcr.io first and enter your API key. In that case you would run docker pull nvcr.io/my-ea-org/my-ea-team/dicom-seg-writer:0.7.3-2011.5

@hrlee thanks for reporting the issue. The dicom-seg-writer container should be in 0.7.2, the public release on NGC. We’ll look into the issue why it does not appear on public release of Clara Deploy on NGC.

Also, I just checked, the EA version dicom-seg-writer exists in 0.7.2, and it just is missing in GA
docker pull nvcr.io/ea-nvidia-clara/clara/dicom-seg-writer:0.7.2-2009.3

So, if you have EA access, then there are two options to workaround the issue:

  1. pull the docker image as shown above, and then tag it to the name and version shown in your error. This will avoid re-publishing the pipeline.
  2. modify the pipeline definition in question, and replace the docker image name and version with the one from EA as stated above. This will require re-publishing the pipeline.

Sorry for the inconvenience this has caused.

Hi, @Ming_Q.
Thank you for your reply!

It seems that I don’t have EA access. I can’t pulling the EA version dicom-seg-writer.

docker login nvcr.io

and I have another question. Does not the publicly available version of Clara support canceling pending jobs?

I checked “pending” status of jobs in web. Maybe it was because of the lack of resources.

Clara cancel job command doesn’t work. also, I can’t find running pod except covid.
I can’t use method suggested by @Alvin . Is there any other way?

Thank you for your support always!

#1 Missing dicom-seg-writer in 0.7.2 GA is being addressed by the team as we speak.
#2 canceling pending job is not supported. Clara Deploy version 0.7.4, ETA early Dec, will have additional support.

Thanks again for test driving Clara Deploy!

OK! I will wait until then. :)
I really appreciate to your help!

Thanks for the wait.

The DICOM Seg Writer operator/Docker image is accessible on NGC now. Please see the version tags at the URL below