Having issues on VLM workflow

sochoa · July 9, 2024, 2:31pm

Here are some troubleshooting steps:

Stop all JPS services*

Stop any JPS foundation services you launched with systemctl such as redis, monitoring or ingress.

Stopping redis for example:
sudo systemctl stop jetson-redis

Stop all docker containers

If any of the workflows are running such as ai-nvr, zero shot detection or the vlm service bring them down with their respective docker compose down commands.

Check for any other running docker containers with

docker ps

Then stop any running docker containers:

docker stop <container_name>

Delete the VLM model folder

If the VLM model was partially downloaded or failed to finish quantization, you will need to delete the model folder so next time the VLM service is launched it will redownload and optimize the model.

It is easiest to delete the whole VLM folder. This will delete any downloaded VLM models.

sudo rm -rf /data/vlm

Be careful here, you do not want to delete the entire /data folder just the vlm folder inside /data.

Restart the Jetson
Verify there is sufficient disk space

When the jetson comes back up run

df /data -h

This will print out the disk usage for the /data folder. Verify there is sufficient space for the VLM model. Vila 2.7B requires 7.1 GBs.

Launch the VLM service again configured to VILA2.7b

Follow the documentation to launch the necessary foundation services and the VLM service again.

Monitor the memory usage and the VLM status

Monitor the memory usage with top or jtop.
Check the VLM status at the health endpoint http://0.0.0.0:5015/v1/health

If this does not work, try using “Efficient-Large-Model/VILA1.5-3b” and see if you get different results from this.

Thanks,
Sammy Ochoa