ERROR: Failed to build visual engine

AWS EC2 instance: g6e.2xlarge

docker run --rm -it --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 --tmpfs /tmp:exec --name via-server --gpus ‘“device=all”’ -p 9000:9000 -p 8000:8000 -e BACKEND_PORT=8000 -e FRONTEND_PORT=9000 -e NVIDIA_API_KEY= -e NGC_API_KEY= -e VLM_MODEL_TO_USE=vita-2.0 -v ~/vita-2.0:/root/.via/ngc_model_cache -e MODEL_PATH=“ngc:nvidia/tao/vita:2.0.1” -v via-hf-cache:/tmp/huggingface --privileged=true -e VLM_BATCH_SIZE=1 nvcr.io/metropolis/via-dp/via-engine:2.0-dp

[09/05/2024-11:40:42] [TRT] [I] Exporting onnx
[09/05/2024-11:40:51] [TRT] [I] Building TRT engine for visual_encoder
[09/05/2024-11:40:51] [TRT] [I] [MemUsageChange] Init CUDA: CPU +1, GPU +0, now: CPU 16148, GPU 1632 (MiB)
[09/05/2024-11:41:00] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +1820, GPU +314, now: CPU 18104, GPU 1946 (MiB)
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:604] Reading dangerously large protocol message. If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons. To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:81] The total number of bytes read was 867413657
[09/05/2024-11:41:01] [TRT] [I] Succeeded parsing /root/.via/ngc_model_cache/nvidia_tao_vita_2.0.1_vila-llama-3-8b-lita/trt-engines/fp16/0-gpu/visual_engines/onnx/visual_encoder.onnx
[09/05/2024-11:41:02] [TRT] [I] Processed image dims 384x384
[09/05/2024-11:41:02] [TRT] [I] Global timing cache in use. Profiling results in this builder pass will be stored.
autotuner.cpp:560: DCHECK(!lpm.mvmf().empty() && “compile resulted in null program?”) failed.
[09/05/2024-11:41:10] [TRT] [E] 9: Skipping tactic 0x0000000000000000 due to exception [operation.cpp:finalize_cask_heuristic_result:1547] Platform (Cuda) error
[09/05/2024-11:41:10] [TRT] [W] Unable to determine GPU memory usage: an illegal memory access was encountered
[09/05/2024-11:41:10] [TRT] [W] Unable to determine GPU memory usage: an illegal memory access was encountered
[09/05/2024-11:41:10] [TRT] [W] Unable to determine GPU memory usage: an illegal memory access was encountered
[09/05/2024-11:41:10] [TRT] [W] Unable to determine GPU memory usage: an illegal memory access was encountered
[09/05/2024-11:41:12] [TRT] [E] 1: [defaultAllocator.cpp::allocate::20] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[09/05/2024-11:41:12] [TRT] [W] Requested amount of GPU memory (1387266048 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[09/05/2024-11:41:12] [TRT] [E] 9: Skipping tactic 0x0000000000000000 due to exception [tunable_graph.cpp:create:114] autotuning: User allocator error allocating 1387266048-byte buffer
[09/05/2024-11:41:12] [TRT] [E] 10: Could not find any implementation for node {ForeignNode[/tower/vision_tower/vision_model/embeddings/position_embedding/Constant_output_0…/projector/projector.4/Add]}.
[09/05/2024-11:41:12] [TRT] [E] 1: [cudaResources.cpp::~ScopedCudaStream::47] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[09/05/2024-11:41:12] [TRT] [E] 10: [optimizer.cpp::computeCosts::4048] Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[/tower/vision_tower/vision_model/embeddings/position_embedding/Constant_output_0…/projector/projector.4/Add]}.)
Traceback (most recent call last):
File “/opt/nvidia/via/via-engine/models/vita20/trt_helper/build_visual_engine.py”, line 285, in
builder.build()
File “/opt/nvidia/via/via-engine/models/vita20/trt_helper/build_visual_engine.py”, line 69, in build
build_vila_engine(args)
File “/opt/nvidia/via/via-engine/models/vita20/trt_helper/build_visual_engine.py”, line 252, in build_vila_engine
build_trt_engine(args.model_type, image.shape[2], image.shape[3],
File “/opt/nvidia/via/via-engine/models/vita20/trt_helper/build_visual_engine.py”, line 136, in build_trt_engine
raise RuntimeError(“Failed building %s” % (engine_file))
RuntimeError: Failed building /root/.via/ngc_model_cache/nvidia_tao_vita_2.0.1_vila-llama-3-8b-lita/trt-engines/fp16/0-gpu/visual_engines/visual_encoder.engine
ERROR: Failed to build visual engine
2024-09-05 11:41:18,321 ERROR Failed to load VIA pipeline - Failed to generate TRT-LLM engine
Killed process with PID 49

You may be running out of memory. Can you try to use a GPT4o API to see if you can run VIA without locally hosting VITA2.0?

Could you run the command nvidia-smi and attach the result? It looks like a memory issue just from the log attached. Thanks

Could you try the following methods to narrow down this issue?

  1. Clean up the downloaded model / trt engine directory and run that again
  2. Reboot your machine and run that again
  3. Run dmesg to check if there is anomaly in the log.
  4. Make sure your system only runs VIA service