I try to inference model VILA 3b, but i meet this errror when i run:
python3 scripts/launch_triton_server.py --world_size 1 --model_repo=multimodal_ifb/ --tensorrt_llm_model_name tensorrt_llm,multimodal_encoders --multimodal_gpu0_cuda_mem_pool_bytes 300000000
I using
container: nvcr.io/nvidia/tritonserver:24.11-trtllm-python-py3
transformers 4.43.4
tensorrt_llm 0.15.0