Riva Bot Maker

Please provide the following information when requesting support.

Hardware - GPU (A100/A30/T4/V100)
Hardware - CPU
Operating System
Riva Version
TLT Version (if relevant)
How to reproduce the issue ? (This is for errors. Please share the command and the detailed log here)

Hi,
I am using Riva Bot Maker. I am training intent slot classification model using the bash script:

> bash run_jarvis_domain_builder.sh train domain_model \
>     --dataset_path datasets/business_enquiry/business_enquiry.yaml \
>     --result_path models\
>     --version 1 \
>     --epochs 10 \
>     --domain business_enquiry \
>     --batch_size 1

I am getting the following error:

2022-07-27 09:38:23,536 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.
Exception from training module: Missing TLT model checkpoint, Training Failed
Failed to train model with error Missing TLT model checkpoint, Training Failed
Domain Builder CLI failed with error Missing TLT model checkpoint, Training Failed
Exiting DomainBuilder Container

I am following all the procedures like pulling the appropriate containers.

GPU- Tesla T4
Tlt_pytorch_image=“nvcr.io/nvidia/tlt-pytorch:v3.0-dp-py3
Operating System =

NAME="Ubuntu"
VERSION="18.04.6 LTS (Bionic Beaver)"

Thanks

Hi @ayush.raj

Thanks for your interest in Riva

I will check regarding your issue with the team and get back

Thanks

Hi @rvinobha,
Any updates on this?

Hi @rvinobha,
Did you get a chance to reach out to the team?

Hi @ayush.raj

I have notified the team on your issue last month,
I guess they might have reach out with the Quantiphi Team they are working with, I will check again regarding the same with my team

Thanks