Tao-client fail to cancel job

$ tao-client --version
tao-client, version 4.0.0

$ tao-client lprnet model-job-cancel
id: 851d4061-9c3d-4568-96f6-0c0d6a891a66
job: 0fcde70b-4187-493e-8424-a9b612081c09
Traceback (most recent call last):
  File "/home/nvidiatao/.local/bin/tao-client", line 8, in <module>
    sys.exit(cli())
  File "/home/nvidiatao/.local/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/home/nvidiatao/.local/lib/python3.8/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/home/nvidiatao/.local/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/nvidiatao/.local/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/nvidiatao/.local/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/nvidiatao/.local/lib/python3.8/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/home/nvidiatao/.local/lib/python3.8/site-packages/tao_cli/networks/lprnet.py", line 151, in model_job_cancel
    model_obj.model_job_cancel(id, job)
AttributeError: 'Model' object has no attribute 'model_job_cancel'

Could you share the full notebook as well?
How about other commands? Are they successful?

Hi @Morganh,

Only model_create works all others command does not work.

I have check the python code and found some issues.

The class Model() have only one function despite call others functions such as model_job_cancel.

/home/nvidiatao/.local/lib/python3.8/site-packages/tao_cli/networks/lprnet.py
from tao_cli.cli_actions.model import Model

.
.
.
dataset_obj = Dataset()
model_obj = Model()

.
.
.

@lprnet.command()
@click.option('--id', prompt='id', help='The model ID.', required=True)
@click.option('--job', prompt='job', help='The job ID.', required=True)
def model_job_cancel(id, job):
    model_obj.model_job_cancel(id, job)
    click.echo(f"{job}")
.
.
.

The content of tao_cli.cli_actions.model

# SPDX-FileCopyrightText: Copyright (c) 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: LicenseRef-NvidiaProprietary
#
# NVIDIA CORPORATION, its affiliates and licensors retain all intellectual
# property and proprietary rights in and to this material, related
# documentation and any modifications thereto. Any use, reproduction,
# disclosure or distribution of this material and related documentation
# without an express license agreement from NVIDIA CORPORATION or
# its affiliates is strictly prohibited.

import json
import requests

from tao_cli.cli_actions.actions import Actions

class Model(Actions):
    def __init__(self):
        super().__init__()

    def model_create(self, network_arch, encryption_key):
        data = json.dumps( {"network_arch": network_arch, "encryption_key": encryption_key} )
        endpoint = self.base_url + "/model"
        response = requests.post(endpoint, data=data, headers=self.headers)
        id = response.json()["id"]
        return id
~

If you follow notebook to run the commands, may I know which jupyter notebook did you follow?

I’m using tao-getting-started_v4.0.0/notebooks/tao_api_starter_kit/client/automl/lprnet.ipynb.

The notebook have no issue all is working well.

The question is what is procedure to stop a training since all method (tao-client and tao-api) that I have used does not work and there is no procedure on notebook to stop the training.

By mistake I started 2 training. I ran the cell twice and I really need to stop one of the trainings.

I had to find process training on OS and kill all process on OS.

Using tao-client I can create a job but I cannot cancel a job due erros raised.

BASE_URL=http://localhost:32080/default/api/v1
tao-client login --ngc-api-key <api_key>
USER="user_id"
TOKEN="token_id"
tao-client lprnet model-train --id ## IT Works
tao-client lprnet model-job-cancel  ## IT FAIL

Using Tao API

Cancel don’t works its return nothing and does not stop the job.

curl "http://myhost.amazonaws.com:32080/api/v1/user/<userid>/model/59fd22a8-57bc-489e-8530-3a346c1177f9/job/0847956d-7ceb-4cc3-85b1-9cf1dd7cffe3/cancel" \
  -X POST \
  -H "authorization: Bearer  <auth_key>" 

Delete Job works well. Job is deleted if not running.

curl "http://myhost.amazonaws.com:32080/api/v1/user/<user_id>/model/59fd22a8-57bc-489e-8530-3a346c1177f9/job/e0df15e0-222b-4daa-83bf-c562fc044cdf" \
  -X DELETE \
  -H "authorization: Bearer  <auth_key>" 

It query job status without issues.

curl "http://myhost.amazonaws.com:32080/api/v1/user/<user_id>/model/59fd22a8-57bc-489e-8530-3a346c1177f9/job/0847956d-7ceb-4cc3-85b1-9cf1dd7cffe3" \
  -H "authorization: Bearer <auth_key>  "  

I can reproduce with above command. Will check internally.

We will provide new tao client wheel. Please stay tune.

1 Like

News?

Currently, there is not any update yet.