How does Triton server manage many models?

If the model repository contains a lot of models which can not be accommodated by one GPU at the same time (probably due to GPU memory limit), is there a scheduling policy to load/unload models dynamically? If so, what is the impact to inference latency if a request hits an unloaded model?