Hi all,
I had tried the on-premise version of llama3-8b-instruct
deployment. The deployment steps were referred from Docker area.
First of all, I observed it took a lot of GPU memory (~36GB). It seems not like the docs mentioned **24GB
Please check hereSecondly, after I successfully deployed, I used the command from the page suggested to test the model.
But the results were very unable understand. I can say super random.
{"id":"cmpl-09ce782c884f40bebb696bcdbb333eb3",
"object":"chat.completion",
"created":1718767345,
"model":"meta/llama3-8b-instruct",
"choices":[{"index":0,"message":{"role":"assistant",
"content":"Cloud bathroom Of downloaded Return to You(eny Name-------------QAJa Lifetime Caught0 HmmUCHleg Do You Had Number Daughter Onlycccc Even${Sat.\r\n\r\n ThatTag)&SaCaughtLCForgery/Hex YourFolder#{ Sonra InsidelanguagesP Love Very MajorityDiscoverHelpArm And Herencmont Alone Q_Base(Pbab Tight WhoAre"},
"logprobs":null,"finish_reason":"length","stop_reason":null}],"usage":{"prompt_tokens":22,"total_tokens":86,"completion_tokens":64}}%
Here is the picture regarding the log of service.
Spec
- GPU instance: A100 40GB
- CPU cores: 128
- RAM: 256GB
I am wondering is there any idea about this issue?
Thank you so much.
BR,
Chieh