AWS EC2 instance: g6e.2xlarge
docker run --rm -it --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 --tmpfs /tmp:exec --name via-server --gpus ‘“device=all”’ -p 9000:9000 -p 8000:8000 -e BACKEND_PORT=8000 -e FRONTEND_PORT=9000 -e NVIDIA_API_KEY= -e NGC_API_KEY= -e VLM_MODEL_TO_USE=vita-2.0 -v ~/vita-2.0:/root/.via/ngc_model_cache -e MODEL_PATH=“ngc:nvidia/tao/vita:2.0.1” -v via-hf-cache:/tmp/huggingface --privileged=true -e VLM_BATCH_SIZE=1 nvcr.io/metropolis/via-dp/via-engine:2.0-dp
[09/05/2024-11:40:42] [TRT] [I] Exporting onnx
[09/05/2024-11:40:51] [TRT] [I] Building TRT engine for visual_encoder
[09/05/2024-11:40:51] [TRT] [I] [MemUsageChange] Init CUDA: CPU +1, GPU +0, now: CPU 16148, GPU 1632 (MiB)
[09/05/2024-11:41:00] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +1820, GPU +314, now: CPU 18104, GPU 1946 (MiB)
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:604] Reading dangerously large protocol message. If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons. To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:81] The total number of bytes read was 867413657
[09/05/2024-11:41:01] [TRT] [I] Succeeded parsing /root/.via/ngc_model_cache/nvidia_tao_vita_2.0.1_vila-llama-3-8b-lita/trt-engines/fp16/0-gpu/visual_engines/onnx/visual_encoder.onnx
[09/05/2024-11:41:02] [TRT] [I] Processed image dims 384x384
[09/05/2024-11:41:02] [TRT] [I] Global timing cache in use. Profiling results in this builder pass will be stored.
autotuner.cpp:560: DCHECK(!lpm.mvmf().empty() && “compile resulted in null program?”) failed.
[09/05/2024-11:41:10] [TRT] [E] 9: Skipping tactic 0x0000000000000000 due to exception [operation.cpp:finalize_cask_heuristic_result:1547] Platform (Cuda) error
[09/05/2024-11:41:10] [TRT] [W] Unable to determine GPU memory usage: an illegal memory access was encountered
[09/05/2024-11:41:10] [TRT] [W] Unable to determine GPU memory usage: an illegal memory access was encountered
[09/05/2024-11:41:10] [TRT] [W] Unable to determine GPU memory usage: an illegal memory access was encountered
[09/05/2024-11:41:10] [TRT] [W] Unable to determine GPU memory usage: an illegal memory access was encountered
[09/05/2024-11:41:12] [TRT] [E] 1: [defaultAllocator.cpp::allocate::20] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[09/05/2024-11:41:12] [TRT] [W] Requested amount of GPU memory (1387266048 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[09/05/2024-11:41:12] [TRT] [E] 9: Skipping tactic 0x0000000000000000 due to exception [tunable_graph.cpp:create:114] autotuning: User allocator error allocating 1387266048-byte buffer
[09/05/2024-11:41:12] [TRT] [E] 10: Could not find any implementation for node {ForeignNode[/tower/vision_tower/vision_model/embeddings/position_embedding/Constant_output_0…/projector/projector.4/Add]}.
[09/05/2024-11:41:12] [TRT] [E] 1: [cudaResources.cpp::~ScopedCudaStream::47] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[09/05/2024-11:41:12] [TRT] [E] 10: [optimizer.cpp::computeCosts::4048] Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[/tower/vision_tower/vision_model/embeddings/position_embedding/Constant_output_0…/projector/projector.4/Add]}.)
Traceback (most recent call last):
File “/opt/nvidia/via/via-engine/models/vita20/trt_helper/build_visual_engine.py”, line 285, in
builder.build()
File “/opt/nvidia/via/via-engine/models/vita20/trt_helper/build_visual_engine.py”, line 69, in build
build_vila_engine(args)
File “/opt/nvidia/via/via-engine/models/vita20/trt_helper/build_visual_engine.py”, line 252, in build_vila_engine
build_trt_engine(args.model_type, image.shape[2], image.shape[3],
File “/opt/nvidia/via/via-engine/models/vita20/trt_helper/build_visual_engine.py”, line 136, in build_trt_engine
raise RuntimeError(“Failed building %s” % (engine_file))
RuntimeError: Failed building /root/.via/ngc_model_cache/nvidia_tao_vita_2.0.1_vila-llama-3-8b-lita/trt-engines/fp16/0-gpu/visual_engines/visual_encoder.engine
ERROR: Failed to build visual engine
2024-09-05 11:41:18,321 ERROR Failed to load VIA pipeline - Failed to generate TRT-LLM engine
Killed process with PID 49