Having issues on VLM workflow

I tried to run the smaller model VILA-2.7b for the platform services on Orin Nano (8GB) and provided additional swap memory of 32GB as well.

But the service is got froze in the middle and unable to use the platform for mentioned API calls to use the alert features.

Even the swap memory is utilized at its max for the tiny model where the larger model requires 32GB of memory as mentioned in the docs

Move to JPS forum. I will check and feedback.

Hi @sanujen.20,

Here are some troubleshooting steps:

  1. Stop all JPS services*

Stop any JPS foundation services you launched with systemctl such as redis, monitoring or ingress.

Stopping redis for example:
sudo systemctl stop jetson-redis

  1. Stop all docker containers

If any of the workflows are running such as ai-nvr, zero shot detection or the vlm service bring them down with their respective docker compose down commands.

Check for any other running docker containers with

docker ps

Then stop any running docker containers:

docker stop <container_name>

  1. Delete the VLM model folder

If the VLM model was partially downloaded or failed to finish quantization, you will need to delete the model folder so next time the VLM service is launched it will redownload and optimize the model.

It is easiest to delete the whole VLM folder. This will delete any downloaded VLM models.

sudo rm -rf /data/vlm

Be careful here, you do not want to delete the entire /data folder just the vlm folder inside /data.

  1. Restart the Jetson

  2. Verify there is sufficient disk space

When the jetson comes back up run

df /data -h

This will print out the disk usage for the /data folder. Verify there is sufficient space for the VLM model. Vila 2.7B requires 7.1 GBs.

  1. Launch the VLM service again configured to VILA2.7b

Follow the documentation to launch the necessary foundation services and the VLM service again.

  1. Monitor the memory usage and the VLM status

Monitor the memory usage with top or jtop.
Check the VLM status at the health endpoint http://0.0.0.0:5015/v1/health

If this does not work, try using “Efficient-Large-Model/VILA1.5-3b” and see if you get different results from this.

Thanks,
Sammy Ochoa

Yeah, it seems like sometimes the endpoints are working then sometimes it aren’t.

I tried with multiple times and I can’t decide when it will work or not.

When I tried to send the alerts when the stream is outputted, I encountered this behavior from the response of the VILA-2.7b

And sometimes I get the output as what I have given as inputs,
VLM Output: {r0 : "is there a bus?", r1: "is there a bicycle?"}

And sometimes like this,

Hi @sanujen.20,

Unfortunately, the 2.7b and 7b models do not work very well for alert mode. They struggle to follow the JSON format.

The 13b models work significantly better for alerts.

Since you are on Orin Nano, my recommendation would be to adjust the “alert_system_prompt” in the main_config.json for the VLM service.

This is where the model is told to follow the JSON output for alerts.

It will take trial and error, but you can adjust the system prompt to try and get better results with the 2.7b model.

Some things you could try:

  • Give it specific examples of the JSON output
  • Tell it not to repeat itself
  • Tell it to only output the exact number of rules that are provided in the input

In general, any issues you come across with the VLM output, you can describe it in the system prompt and tell the VLM how to improve its output.

Thanks,
Sammy Ochoa

Thanks, will try this.