Please provide the following information when requesting support.
• Hardware (T4/V100/Xavier/Nano/etc)
• Network Type (Detectnet_v2/Faster_rcnn/Yolo_v4/LPRnet/Mask_rcnn/Classification/etc) Classification :efficientnet_b1_relu
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here) v4.0.2
• Training spec file(If have, please share here)
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)
For the AutoML notebook notebooks/tao-getting-started_v4.0.2/notebooks/tao_api_starter_kit/api/automl/classification.ipynb
I want to select efficientnet_b1_relu for my experiments. I mention that in Assign PTM section
When I run the training action, the monitoring page does refresh but it keeps saying it will update after first epoch, but it takes a long time and doesn’t update. Is there any way I can access logs to figure out what’s going wrong?
I mean below issue “nvidia driver modules are not yet loaded invoking run directly”. Previously you did not meet it and can get the log via “kubectl logs -f xxx” , right?
Could you please resume previous environment since previously you can run successfully under 4.0.2?
You can run below to unstall.
$ bash setup.sh uninstall